Hi Peter, I agree that the size based optimization is problematic.
What I would most like to preserve is the performance advantages of using hash coding, which apparently was a goal of the original design: "implementations of the various Collections Framework interfaces are free to take advantage of the specified behavior of underlying {@link Object} methods wherever the implementor deems it appropriate" Promoting ersatz Collections to first class citizens would be a significant incompatible change to the specification because it invalidates this statement. I note that the statement refers to *all* implementations of the Collections Framework interface, not just JDK implementations, so a JDK-internal solution is inadequate. I agree that some way of identifying ersatz Collections at run time could be useful. It’s tricky, however, as SortedSets and TreeSets are ersatz based on the behavior of the comparator/compare method, which is not discoverable, which means the application must provide that information. Alan > On May 16, 2019, at 11:48 PM, Peter Levart <peter.lev...@gmail.com> wrote: > > Hi Alan, > > I can sympathize with the performance loss aspect of this patch, but I > nevertheless think it is a move in the right direction. You must admit that > the performance optimization of AbstractSet.removeAll was a rather shaky one > (dependent on the relative sizes of the two collections) and as such very > prone to sudden performance spikes that could occur for seemingly unknown > reason. More importantly, the fact that the semantics of the method could > change drastically dependent on such thing as relative sizes of the two > collections was inexplicable. You may argue that the semantics of the method > is undefined if called upon or passed an ersatz Collection instance, but even > the undefined semantics may be consistent - i.e. independent of such things > as relative sizes of both collections. > > It would be a perfect world if this optimization was not there to begin with. > People would eventually find bottlenecks using this method and change their > code to use explicit iteration or such. There may be perfectly valid code out > there that uses conformant collection implementations that don't cause > AbstractSet.removeAll to behave inconsistently and such code after this patch > may perform badly. But it will perform consistently for cases that use ersatz > Collections and that is more important in my opinion. > > As Stuart says, optimizations mustn't change semantics. So is there a > possibly narrower optimization, that doesn't change the semantics? > > Here's a sketch of one: > - create a marker interface (could be JDK-internal) and mark all conforming > Set implementations with it > - use AbstractSet.removeAll optimization only when both collections implement > the marker interface > > > Regards, Peter > > On 5/17/19 7:06 AM, Alan Snyder wrote: >> Hi Stuart, >> >> I believe I already accepted the fact of ersatz Collections. >> >> Could you explain the inconsistency in the specification that affects >> removeAll()? I don’t see it. >> >> In the current specification, the classes that can create ersatz Collections >> all call out the ersatz Collections in their documentation. >> >> SortedSet: >> Note that the ordering maintained by a sorted set (whether or not an >> explicit comparator is provided) must be <i>consistent with equals</i> if >> the sorted set is to correctly implement the {@code Set} interface. >> The behavior of a sorted set <i>is</i> well-defined even if its >> ordering is inconsistent with equals; it just fails to obey the general >> contract of the {@code Set} interface. >> >> TreeSet is similar >> >> IdentityHashMap: >> >> This class is <i>not</i> a general-purpose {@code Map} implementation! >> While this class implements the {@code Map} interface, it >> intentionally violates {@code Map's} general contract, which mandates the >> use of the {@code equals} method when comparing objects. >> >> IdentityHashMap.keySet: >> >> While the object returned by this method implements the >> {@code Set} interface, it does <i>not</i> obey {@code Set's} general >> contract. Like its backing map, the set returned by this method >> defines element equality as reference-equality rather than >> object-equality. This affects the behavior of its {@code contains}, >> {@code remove}, {@code containsAll}, {@code equals}, and >> {@code hashCode} methods. >> >> The logical implication of not supporting the general contract is that >> Collection operations performed on an ersatz Collection may have >> unexpected (implementation specific) effects, which implies that passing one >> of these ersatz Collections to a parameter of a Collection >> type may have unexpected (implementation specific) effects. >> >> In other words, whether removeAll is performed on an ersatz Collection or >> the parameter is an ersatz Collection, the effects may be >> implementation specific and inconsistent with the specification of removeAll. >> >> That is not to say that improving the design would not have benefit, of >> course it does. The bug reports you cite prove that. >> However, the compatibility impact of the change needs to be taken into >> account. >> >> Alan >> >> >> >>> On May 16, 2019, at 8:42 PM, Stuart Marks <stuart.ma...@oracle.com> wrote: >>> >>> Hi Alan, >>> >>> Whether you call them "ersatz" collections, "non-standard" collections, or >>> collections with different membership semantics, they have been around in >>> the collections framework from day one. This is perhaps unfortunate, and it >>> may be considered to be bad design, but it is a fact. >>> >>> The issue is not with undefined or implementation-specific behavior. The >>> issue is that the specification as it stands is self-contradictory. In >>> particular, it assumes that certain operations are symmetric that in fact >>> are not, if you read other parts of the spec and derive logical conclusions >>> from them. This is what I'm trying to fix. This is not a mere a matter of >>> intellectual consistency. This state of affairs has real consequences. >>> There is a whole family of bugs linked to this one which complain both >>> about the performance issue and the semantic issue. All of these bugs are >>> directly related to the self-contradictory nature of the current >>> specification. With my changes, the spec is self-contradictory in fewer >>> places. (There is, however, more work to be done; see JDK-8190545.) >>> >>> Your lead point, though, is about performance. The performance optimization >>> of AbstractSet.removeAll is one of those that assumes that the operation is >>> symmetric, when in fact it isn't. As a rule, optimizations mustn't change >>> semantics. Therefore, it has to go. >>> >>> s'marks >>> >