Hi Stuart,

I strongly disagree with this change. (Fortunately, I am not a Reviewer, just 
Opinionated.)

Summary:
  1. The change to removeAll() is a significant incompatible change because of 
its performance implications.
  2. The benefit of the change to removeAll() is restricted to a few edge cases 
which arguably are not bugs (with respect to the current documentation). 
  3. The benefit does not outweigh the cost.

In addition, the new concept of "membership semantics" is inadequately defined.

In detail:

I would describe the state of Collections before this change as:

An object supporting the Collection interface either:

1. has a state that identifies a set of member objects

where members can be identified by the results of iteration

where object identification refers to the usual class-based equals/hashCode 
concept

where set is the mathematical notion

or

2. is an ersatz Collection that adopts the Collection interface because the 
object supports many useful Collection operations, but not all operations obey 
#1

The existence of ersatz Collections in the JDK is not ideal, but with the 
understanding that the effects of Collection operations on ersatz Collections 
is undefined (implementation dependent), it can be made to work. With this 
understanding, the "semantic shift" of removeAll() is simply a reflection of 
the undefined (implementation dependent) behavior of ersatz Collections and is 
not a bug.

The revised design tries to improve this situation by defining a new category 
of Collection, namely, a Collection whose membership semantics are defined in a 
non-standard way (not compatible with equals and hashCode). My understanding is 
that the ersatz Collections defined by the JDK are all Sets that have either 
stronger or weaker notions of member identity. Hence, the proposal has the 
potential of replacing ersatz Collections with non-standard, but well behaved 
Collections. Indeed, a claim is made that Collections with non-standard 
membership semantics are well defined and an implication is made that the only 
possible issue is that the results may be "counterintuitive" when Collections 
with different membership semantics are mixed.

This sounds like an improvement, except for two things:

1. I don't think removeAll() is well defined.

The definition says: "Removes all of this collection's elements that are also 
contained in the specified collection."

What are the elements of a Collection with a non-standard membership semantics? 
Are they all of the objects for which contains() returns true? Are they only 
the objects returned by the iterator? Is the Collection required to define a 
canonical set of objects that the iterator must return? Can an iterator return 
two non-equal objects that are equivalent according to its membership 
semantics? What happens if the specified collection contains one of these 
objects but not the other?

The definition says: "After this call returns, this collection will contain no 
elements in common with the specified collection."

That sounds like Collections.disjoint(), which is undefined in the mixed 
membership semantics case.

If the statement is meaningful, why isn't Collections.disjoint() defined in 
this case?

Alternatively, why is it OK for Collections.disjoint() to be undefined, but not 
removeAll()?

2. There will be serious performance hits.

To me, the troubling consequence is that using removeAll() to remove a small 
number of elements from a large set implemented using hashing goes from fast to 
very slow. This consequence does not appear to be covered by the current 
implementation note. It has nothing to do with the performance of contains() on 
the parameter, but on the need to test every member of the target.

I will need to review every use of removeAll() in my code and convert to 
iteration where appropriate. This is a significant incompatibility!

Other comments (if the above issues are discounted):

The concept of non-standard membership semantics is introduced (in Collection) 
without any previous definition of standard membership semantics or any 
specification of the requirements for a valid membership semantics. A linkage 
to the contains() method is implied, but not explained.

Classes such as SortedSet and TreeSet that currently can be used to create 
ersatz Collections should have their documentation changed to say that they may 
have non-standard membership semantics.

There should be some guidance to developers on how to write code that operates 
in a well defined manner on both standard and non-standard Collections, 
including multiple Collections with mixed membership semantics.

The definition of List.contains() rules out a List class with non-standard 
membership semantics. Is this necessary?

Given that this is a seriously incompatible change (in performance), why not 
add a method to reveal whether a Collection is using non-standard membership 
semantics? If this method returns false, then the Collection is either standard 
or ersatz, and in the latter case, generic behavior remains undefined, so the 
old removeAll() optimization can be used.

Alternatively, if the behavior of removeAll with a non-standard parameter is 
important, it can be supported by a filter(predicate) operation.

Regards,

  Alan


> On May 13, 2019, at 5:28 PM, Stuart Marks <stuart.ma...@oracle.com> wrote:
> 
> Hi all,
> 
> Here's an updated webrev for this change. The general direction of the change 
> is the same as before. Differences are primarily additional wording in a few 
> relevant places, plus editorial and markup tweaks. Highlights of changes from 
> the previous webrev include:
> 
> - addition of FIXME comment and reference to javadoc bug report, where doc 
> comment from interface cannot be inherited
> - tuned up wording around operations that depend on membership semantics of 
> more than one collection
> - updated wording about membership semantics in Collections.disjoint()
> - added performance-related @implNote to List.removeAll/retainAll
> - various markup fixes
> 
> Webrev:
> 
>    http://cr.openjdk.java.net/~smarks/reviews/6394757/webrev.1/
> 
> Bug:
> 
>    https://bugs.openjdk.java.net/browse/JDK-6394757
> 
> Previous review thread:
> 
>    http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-May/060007.html
> 
> Thanks,
> 
> s'marks
> 

Reply via email to