Hi Peter,
I agree that the size based optimization is problematic.
What I would most like to preserve is the performance advantages of using hash
coding, which apparently was a goal of the original design:
"implementations of the various Collections Framework interfaces are free to
take advantage of the specified behavior of underlying {@link Object} methods
wherever the implementor deems it appropriate"
Promoting ersatz Collections to first class citizens would be a significant
incompatible change to the specification because it invalidates this statement.
I note that the statement refers to *all* implementations of the Collections
Framework interface, not just JDK implementations, so a JDK-internal solution
is inadequate.
I agree that some way of identifying ersatz Collections at run time could be
useful. It’s tricky, however, as SortedSets and TreeSets are ersatz based on
the behavior of the comparator/compare method, which is not discoverable, which
means the application must provide that information.
Alan
> On May 16, 2019, at 11:48 PM, Peter Levart <[email protected]> wrote:
>
> Hi Alan,
>
> I can sympathize with the performance loss aspect of this patch, but I
> nevertheless think it is a move in the right direction. You must admit that
> the performance optimization of AbstractSet.removeAll was a rather shaky one
> (dependent on the relative sizes of the two collections) and as such very
> prone to sudden performance spikes that could occur for seemingly unknown
> reason. More importantly, the fact that the semantics of the method could
> change drastically dependent on such thing as relative sizes of the two
> collections was inexplicable. You may argue that the semantics of the method
> is undefined if called upon or passed an ersatz Collection instance, but even
> the undefined semantics may be consistent - i.e. independent of such things
> as relative sizes of both collections.
>
> It would be a perfect world if this optimization was not there to begin with.
> People would eventually find bottlenecks using this method and change their
> code to use explicit iteration or such. There may be perfectly valid code out
> there that uses conformant collection implementations that don't cause
> AbstractSet.removeAll to behave inconsistently and such code after this patch
> may perform badly. But it will perform consistently for cases that use ersatz
> Collections and that is more important in my opinion.
>
> As Stuart says, optimizations mustn't change semantics. So is there a
> possibly narrower optimization, that doesn't change the semantics?
>
> Here's a sketch of one:
> - create a marker interface (could be JDK-internal) and mark all conforming
> Set implementations with it
> - use AbstractSet.removeAll optimization only when both collections implement
> the marker interface
>
>
> Regards, Peter
>
> On 5/17/19 7:06 AM, Alan Snyder wrote:
>> Hi Stuart,
>>
>> I believe I already accepted the fact of ersatz Collections.
>>
>> Could you explain the inconsistency in the specification that affects
>> removeAll()? I don’t see it.
>>
>> In the current specification, the classes that can create ersatz Collections
>> all call out the ersatz Collections in their documentation.
>>
>> SortedSet:
>> Note that the ordering maintained by a sorted set (whether or not an
>> explicit comparator is provided) must be <i>consistent with equals</i> if
>> the sorted set is to correctly implement the {@code Set} interface.
>> The behavior of a sorted set <i>is</i> well-defined even if its
>> ordering is inconsistent with equals; it just fails to obey the general
>> contract of the {@code Set} interface.
>>
>> TreeSet is similar
>>
>> IdentityHashMap:
>>
>> This class is <i>not</i> a general-purpose {@code Map} implementation!
>> While this class implements the {@code Map} interface, it
>> intentionally violates {@code Map's} general contract, which mandates the
>> use of the {@code equals} method when comparing objects.
>>
>> IdentityHashMap.keySet:
>>
>> While the object returned by this method implements the
>> {@code Set} interface, it does <i>not</i> obey {@code Set's} general
>> contract. Like its backing map, the set returned by this method
>> defines element equality as reference-equality rather than
>> object-equality. This affects the behavior of its {@code contains},
>> {@code remove}, {@code containsAll}, {@code equals}, and
>> {@code hashCode} methods.
>>
>> The logical implication of not supporting the general contract is that
>> Collection operations performed on an ersatz Collection may have
>> unexpected (implementation specific) effects, which implies that passing one
>> of these ersatz Collections to a parameter of a Collection
>> type may have unexpected (implementation specific) effects.
>>
>> In other words, whether removeAll is performed on an ersatz Collection or
>> the parameter is an ersatz Collection, the effects may be
>> implementation specific and inconsistent with the specification of removeAll.
>>
>> That is not to say that improving the design would not have benefit, of
>> course it does. The bug reports you cite prove that.
>> However, the compatibility impact of the change needs to be taken into
>> account.
>>
>> Alan
>>
>>
>>
>>> On May 16, 2019, at 8:42 PM, Stuart Marks <[email protected]> wrote:
>>>
>>> Hi Alan,
>>>
>>> Whether you call them "ersatz" collections, "non-standard" collections, or
>>> collections with different membership semantics, they have been around in
>>> the collections framework from day one. This is perhaps unfortunate, and it
>>> may be considered to be bad design, but it is a fact.
>>>
>>> The issue is not with undefined or implementation-specific behavior. The
>>> issue is that the specification as it stands is self-contradictory. In
>>> particular, it assumes that certain operations are symmetric that in fact
>>> are not, if you read other parts of the spec and derive logical conclusions
>>> from them. This is what I'm trying to fix. This is not a mere a matter of
>>> intellectual consistency. This state of affairs has real consequences.
>>> There is a whole family of bugs linked to this one which complain both
>>> about the performance issue and the semantic issue. All of these bugs are
>>> directly related to the self-contradictory nature of the current
>>> specification. With my changes, the spec is self-contradictory in fewer
>>> places. (There is, however, more work to be done; see JDK-8190545.)
>>>
>>> Your lead point, though, is about performance. The performance optimization
>>> of AbstractSet.removeAll is one of those that assumes that the operation is
>>> symmetric, when in fact it isn't. As a rule, optimizations mustn't change
>>> semantics. Therefore, it has to go.
>>>
>>> s'marks
>>>
>