Re: RFR (M): JDK-6394757: rev1: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes

Alan Snyder Fri, 17 May 2019 08:52:26 -0700

Hi Peter,

I agree that the size based optimization is problematic.


What I would most like to preserve is the performance advantages of using hash 
coding, which apparently was a goal of the original design:

"implementations of the various Collections Framework interfaces are free to 
take advantage of the specified behavior of underlying {@link Object} methods 
wherever the implementor deems it appropriate"

Promoting ersatz Collections to first class citizens would be a significant 
incompatible change to the specification because it invalidates this statement.

I note that the statement refers to *all* implementations of the Collections 
Framework interface, not just JDK implementations, so a JDK-internal solution 
is inadequate.

I agree that some way of identifying ersatz Collections at run time could be 
useful. It’s tricky, however, as SortedSets and TreeSets are ersatz based on 
the behavior of the comparator/compare method, which is not discoverable, which 
means the application must provide that information.

  Alan









> On May 16, 2019, at 11:48 PM, Peter Levart <peter.lev...@gmail.com> wrote:
> 
> Hi Alan,
> 
> I can sympathize with the performance loss aspect of this patch, but I 
> nevertheless think it is a move in the right direction. You must admit that 
> the performance optimization of AbstractSet.removeAll was a rather shaky one 
> (dependent on the relative sizes of the two collections) and as such very 
> prone to sudden performance spikes that could occur for seemingly unknown 
> reason. More importantly, the fact that the semantics of the method could 
> change drastically dependent on such thing as relative sizes of the two 
> collections was inexplicable. You may argue that the semantics of the method 
> is undefined if called upon or passed an ersatz Collection instance, but even 
> the undefined semantics may be consistent  - i.e. independent of such things 
> as relative sizes of both collections.
> 
> It would be a perfect world if this optimization was not there to begin with. 
> People would eventually find bottlenecks using this method and change their 
> code to use explicit iteration or such. There may be perfectly valid code out 
> there that uses conformant collection implementations that don't cause 
> AbstractSet.removeAll to behave inconsistently and such code after this patch 
> may perform badly. But it will perform consistently for cases that use ersatz 
> Collections and that is more important in my opinion.
> 
> As Stuart says, optimizations mustn't change semantics. So is there a 
> possibly narrower optimization, that doesn't change the semantics?
> 
> Here's a sketch of one:
> - create a marker interface (could be JDK-internal) and mark all conforming 
> Set implementations with it
> - use AbstractSet.removeAll optimization only when both collections implement 
> the marker interface
> 
> 
> Regards, Peter
> 
> On 5/17/19 7:06 AM, Alan Snyder wrote:
>> Hi Stuart,
>> 
>> I believe I already accepted the fact of ersatz Collections.
>> 
>> Could you explain the inconsistency in the specification that affects 
>> removeAll()? I don’t see it.
>> 
>> In the current specification, the classes that can create ersatz Collections 
>> all call out the ersatz Collections in their documentation.
>> 
>> SortedSet:
>> Note that the ordering maintained by a sorted set (whether or not an
>> explicit comparator is provided) must be <i>consistent with equals</i> if
>> the sorted set is to correctly implement the {@code Set} interface.
>> The behavior of a sorted set <i>is</i> well-defined even if its
>> ordering is inconsistent with equals; it just fails to obey the general
>> contract of the {@code Set} interface.
>> 
>> TreeSet is similar
>> 
>> IdentityHashMap:
>> 
>> This class is <i>not</i> a general-purpose {@code Map} implementation!  
>> While this class implements the {@code Map} interface, it
>> intentionally violates {@code Map's} general contract, which mandates the
>> use of the {@code equals} method when comparing objects.
>> 
>> IdentityHashMap.keySet:
>> 
>> While the object returned by this method implements the
>> {@code Set} interface, it does <i>not</i> obey {@code Set's} general
>> contract.  Like its backing map, the set returned by this method
>> defines element equality as reference-equality rather than
>> object-equality.  This affects the behavior of its {@code contains},
>> {@code remove}, {@code containsAll}, {@code equals}, and
>> {@code hashCode} methods.
>> 
>> The logical implication of not supporting the general contract is that 
>> Collection operations performed on an ersatz Collection may have
>> unexpected (implementation specific) effects, which implies that passing one 
>> of these ersatz Collections to a parameter of a Collection
>> type may have unexpected (implementation specific) effects.
>> 
>> In other words, whether removeAll is performed on an ersatz Collection or 
>> the parameter is an ersatz Collection, the effects may be
>> implementation specific and inconsistent with the specification of removeAll.
>> 
>> That is not to say that improving the design would not have benefit, of 
>> course it does. The bug reports you cite prove that.
>> However, the compatibility impact of the change needs to be taken into 
>> account.
>> 
>>   Alan
>> 
>> 
>> 
>>> On May 16, 2019, at 8:42 PM, Stuart Marks <stuart.ma...@oracle.com> wrote:
>>> 
>>> Hi Alan,
>>> 
>>> Whether you call them "ersatz" collections, "non-standard" collections, or 
>>> collections with different membership semantics, they have been around in 
>>> the collections framework from day one. This is perhaps unfortunate, and it 
>>> may be considered to be bad design, but it is a fact.
>>> 
>>> The issue is not with undefined or implementation-specific behavior. The 
>>> issue is that the specification as it stands is self-contradictory. In 
>>> particular, it assumes that certain operations are symmetric that in fact 
>>> are not, if you read other parts of the spec and derive logical conclusions 
>>> from them. This is what I'm trying to fix. This is not a mere a matter of 
>>> intellectual consistency. This state of affairs has real consequences. 
>>> There is a whole family of bugs linked to this one which complain both 
>>> about the performance issue and the semantic issue. All of these bugs are 
>>> directly related to the self-contradictory nature of the current 
>>> specification. With my changes, the spec is self-contradictory in fewer 
>>> places. (There is, however, more work to be done; see JDK-8190545.)
>>> 
>>> Your lead point, though, is about performance. The performance optimization 
>>> of AbstractSet.removeAll is one of those that assumes that the operation is 
>>> symmetric, when in fact it isn't. As a rule, optimizations mustn't change 
>>> semantics. Therefore, it has to go.
>>> 
>>> s'marks
>>> 
>

Re: RFR (M): JDK-6394757: rev1: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes

Reply via email to