[ https://issues.apache.org/jira/browse/COLLECTIONS-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290679#comment-14290679 ]
Thomas Neidhart commented on COLLECTIONS-548: --------------------------------------------- The documentation clearly states what the method is doing: {noformat} * This implementation calls <code>retainAll()</code> on each collection. {noformat} The retainAll() method of the Collection interface is also well-known and by default (see AbstractCollection) calls contains() on the provided collection. Thus users should be aware of this by now (2015). If a user is really calling retainAll() with a huge list, it's probably better to put the elements in a set and provide this as an argument to retainAll(). My whole point is that there's no use in providing uber-collection types that have an optimal runtime-complexity in all cases but with the trade-off of additional memory requirements. Users have to chose and use proper collection types for their use case. > Performance issue in CompositeCollection::retainAll > --------------------------------------------------- > > Key: COLLECTIONS-548 > URL: https://issues.apache.org/jira/browse/COLLECTIONS-548 > Project: Commons Collections > Issue Type: Bug > Components: Collection > Affects Versions: 4.0 > Environment: Ubuntu 14.04 > Reporter: Oswaldo Olivo > Priority: Minor > Labels: performance > > There seems to be a performance problem with the function retainAll of > CompositeCollection. This is analogous to > https://issues.apache.org/jira/browse/COLLECTIONS-534 . > The following is the code of the function: > {code} > > /** > * Retains all the elements in the specified collection in this composite > collection, > * removing all others. > * <p> > * This implementation calls <code>retainAll()</code> on each collection. > * > * @param coll the collection to remove > * @return true if the collection was modified > * @throws UnsupportedOperationException if retainAll is unsupported > */ > public boolean retainAll(final Collection<?> coll) { > boolean changed = false; > for (final Collection<E> item : all) { > changed |= item.retainAll(coll); > } > return changed; > } > {code} > The performance problem occurs when the underlying collections in the current > collection have a slow retainAll method. Whenever we're relying on > Collection::retainAll, slow cases tend to occur when the parameter collection > has a slow contains method. > The following test harness shows the performance degradation between > using a list and using a set as a parameter, across different collection > sizes. > {code} > public static void compositeCollectionRetainAllTest(boolean original) { > int N=500000; > ArrayList<Integer> firstArrayList=new ArrayList<Integer>(); > ArrayList<Integer> secondArrayList=new ArrayList<Integer>(); > for(int i=0;i<N;++i) { > firstArrayList.add(new Integer(i)); > secondArrayList.add(new Integer(N-1-i)); > > } > CompositeCollection col = new CompositeCollection(firstArrayList); > col.retainAll(original ? secondArrayList : (new > HashSet<Integer>(secondArrayList))); > > } > {code} > In the following table "Original" corresponds to the time taken by > the existing implementation of CompositeCollection::retainAll, "Repaired" to > the time taken by the function invoked with the parameter converted into a > set, and "Speed-up" to the speed-up obtained after the repair. > N Original(s) Repaired(s) Speed-up(X) > 10 1.04 1.04 1.00 > 100 1.04 1.05 0.99 > 1000 1.06 1.06 1.00 > 10000 1.12 1.10 1.01 > 100000 9.34 1.15 8.12 > 500000 > 300 1.29 > 232.55 > If it's unacceptable to convert the parameter into a set before calling > retainsAll, a solution would be to include a warning to the user in the > documentation that the parameter should have a fast contains method when > possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)