[ 
https://issues.apache.org/jira/browse/COLLECTIONS-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290679#comment-14290679
 ] 

Thomas Neidhart commented on COLLECTIONS-548:
---------------------------------------------

The documentation clearly states what the method is doing:

{noformat}
     * This implementation calls <code>retainAll()</code> on each collection.
{noformat}

The retainAll() method of the Collection interface is also well-known and by 
default (see AbstractCollection) calls contains() on the provided collection. 
Thus users should be aware of this by now (2015). If a user is really calling 
retainAll() with a huge list, it's probably better to put the elements in a set 
and provide this as an argument to retainAll().

My whole point is that there's no use in providing uber-collection types that 
have an optimal runtime-complexity in all cases but with the trade-off of 
additional memory requirements. Users have to chose and use proper collection 
types for their use case.

> Performance issue in CompositeCollection::retainAll
> ---------------------------------------------------
>
>                 Key: COLLECTIONS-548
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-548
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Collection
>    Affects Versions: 4.0
>         Environment: Ubuntu 14.04
>            Reporter: Oswaldo Olivo
>            Priority: Minor
>              Labels: performance
>
> There seems to be a performance problem with the function retainAll of 
> CompositeCollection. This is analogous to 
> https://issues.apache.org/jira/browse/COLLECTIONS-534 .
> The following is the code of the function:
> {code}
>  
>     /**
>      * Retains all the elements in the specified collection in this composite 
> collection,
>      * removing all others.
>      * <p>
>      * This implementation calls <code>retainAll()</code> on each collection.
>      *
>      * @param coll  the collection to remove
>      * @return true if the collection was modified
>      * @throws UnsupportedOperationException if retainAll is unsupported
>      */
>     public boolean retainAll(final Collection<?> coll) {
>         boolean changed = false;
>         for (final Collection<E> item : all) {
>             changed |= item.retainAll(coll);
>         }
>         return changed;
>     }
> {code}
> The performance problem occurs when the underlying collections in the current 
> collection have a slow retainAll method. Whenever we're relying on 
> Collection::retainAll, slow cases tend to occur when the parameter collection 
> has a slow contains method.
> The following test harness shows the performance degradation between 
> using a list and using a set as a parameter, across different collection 
> sizes.
> {code}
>  public static void   compositeCollectionRetainAllTest(boolean original) {
>       int N=500000;
>       ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
>       ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
>       for(int i=0;i<N;++i) {
>           firstArrayList.add(new Integer(i));
>           secondArrayList.add(new Integer(N-1-i));
>           
>       }
>       CompositeCollection col = new CompositeCollection(firstArrayList);
>       col.retainAll(original ? secondArrayList : (new 
> HashSet<Integer>(secondArrayList)));
>       
>     }
> {code}
> In the following table "Original" corresponds to the time taken by 
> the existing implementation of CompositeCollection::retainAll, "Repaired" to 
> the time taken by the function invoked with the parameter converted into a 
> set, and "Speed-up" to the speed-up obtained after the repair.
> N             Original(s)     Repaired(s)     Speed-up(X)
> 10             1.04           1.04                    1.00
> 100            1.04           1.05                    0.99
> 1000  1.06                    1.06                    1.00
> 10000 1.12                    1.10                    1.01
> 100000        9.34                    1.15                    8.12
> 500000        > 300           1.29                    > 232.55
> If it's unacceptable to convert the parameter into a set before calling 
> retainsAll, a solution would be to include a warning to the user in the 
> documentation that the parameter should have a fast contains method when 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to