On 6/5/19 7:27 AM, Claes Redestad wrote:
On 2019-06-05 15:49, Tagir Valeev wrote:
In particular it's never mentioned in HashSet, HashMap or Collection spec that toArray implements a fail-fast behavior: this is said only about the iterator() method.

that's not entirely true, since the @implSpec for toArray on
AbstractCollection states it is equivalent to iterating over the
collection[1].

So since the implementations you're changing inherit their current
implementation from AbstractCollection and the iterators of HashMap is
specified to be fail-fast, then that behavior can be argued to be
specified also for the affected toArray methods.

I'm not fundamentally objecting to the behavior change, but I do think
it needs careful review and a CSR (or at least plenty of reviewers
agreeing that one isn't needed).

OK, let's slow down here a bit.

The fail-fast, CME-throwing behavior is primarily of interest when iteration is external, that is, it's driven by the application making a succession of calls on the Iterator. The main concern arises when the application makes other modifications to the collection being iterated, between calls to the Iterator.

There is always the possibility of modification by some other thread, and this is mentioned obliquely in the spec near "fail-fast", but given memory visibility issues it's pretty much impossible to make any concrete statements about what happens if modifications are made by another thread.

In the case of these methods, the iteration occurs entirely within the context of a single method call. There's no possibility of the application making a concurrent modification on the same thread. So, while the fail-fast behavior is being changed, strictly speaking, I consider it "de minimis", as it's pretty difficult for an application to observe this change in behavior.

Regarding the @implSpec, that applies only to the implementation in AbstractCollection, and @implSpec is not inherited. Again, it is a change in behavior since this method is being changed from being inherited to being overridden, but it's not a specification issue.

In any case I don't think the concurrent modification behavior change is an issue to be concerned about.

**

Now, regarding visible changes in behavior, it's quite easy to observe this change in behavior, at least from a subclass of HashSet. A HashSet subclass could override the iterator() method and detect that it was called when toArray() is called. With this change, toArray() is overridden, and so iterator() would no longer be called in this circumstance.

This kind of change is generally allowable, but we have had complaints about the pattern of self-calls changing from release to release. This isn't a reason NOT to make the change, but the fact is that it does make changes to the visible behavior, and this potentially does affect actual code. (Code that I'd consider poorly written, but still.)

I talked this over with Joe Darcy (CSR chair) and we felt that it would be prudent to file a CSR a request to document the behavior change.

**

Some comments on the code.

Overall I think the changes are going in the right direction. It's amazing that after all this time, there are still cases in core collections that are inheriting the slow Abstract* implementations.

In HashMap, I think the factoring between keysToArray() and prepareArray() can be improved. The prepareArray() method itself is sensible, in that it implements to weird toArray(T[]) semantics of allocating a new array if the given one is too short, and it puts a 'null' at the right place if the given array is too long, and returns the possibly reallocated array.

The first thing that keysToArray() does is to call prepareArray() and use its return value. What concerns me is that the no-arg toArray() method does this:

    return keysToArray(new Object[size]);

so the array is created with exactly the right size; yet the first thing keysToArray() does is to call prepareArray(), which checks the size again and determines that nothing need be done.

It would be a small optimization to shift things around so that keysToArray() takes an array known to be "prepared" already and to remove its call to prepareArray(). Then, require the caller to call prepareArray() if necessary. Then we'd have this:

 953  public Object[] toArray() { return keysToArray(new Object[size]); }
 954  public <T> T[] toArray(T[] a) { return keysToArray(prepareArray(a)); }

I'm not terribly concerned about avoiding an extra method call. But other code in this file, for example like the following:

 930  Node<K,V>[] tab;
 931  int idx = 0;
 932  if (size > 0 && (tab = table) != null) {

is written with an embedded assignment to 'tab', probably in order to avoid a useless load of the 'table' field if the size is zero. (This style occurs in several places.) So, if this code is making this level of micro-optimizations already, let's not waste it by calling extra methods unnecessarily. (I don't know if this will impact the benchmarks though.)

Aside from performance, refactoring keysToArray/prepareArray this way makes more sense to me anyway.

Similar adjustments to the call sites within HashSet and LinkedHashMap would need to be done.

I'd recommend adding doc comments to HashMap prepareArray() and keysToArray(), since they are either called from or overridden from outside this file. It needn't be a full-on spec comment, but enough to make it clear that other things are depending on these internal interfaces.

Thanks,

s'marks

Reply via email to