[
https://issues.apache.org/jira/browse/COLLECTIONS-433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeffrey Barnes updated COLLECTIONS-433:
---------------------------------------
Attachment: COLLECTIONS-433.patch
I have developed a patch that greatly improves the performance of
{{TreeList.addAll(Collection)}}. The theoretical complexity is improved from
_O_(_n_ log _m_) to _O_(_n_ + log _m_), where _m_ is the size of the
{{TreeList}} and _n_ is the size of the collection to be added, and performance
tests show a significant practical gain.
The Stack Overflow question that Adrian Nistor linked is informative but not
the ideal algorithm to use here, as it handles the case where there are two
arbitrary (sorted) AVL trees that must be merged into one (sorted) AVL tree.
Because the elements of the two AVL trees may need to be interleaved in
arbitrary ways, the best we can do in that situation is to flatten the trees
into lists, merge them, and construct a new AVL tree, as the Stack Overflow
answer says.
The situation here, however, is somewhat different, because we know that all
the elements of the new tree come to the right of the original tree. (The AVL
tree that backs {{TreeList}} is keyed by list index, not by element value.)
Thus, a more on-point Stack Overflow question is [this
one|http://stackoverflow.com/questions/2037212/concatenating-merging-joining-two-avl-trees],
which deals with the question of merging/concatenating two AVL trees when the
elements of the first tree are known to have smaller keys than those of the
second tree. In this case, we can merge the trees in logarithmic time! The
algorithm, outlined by user meriton on Stack Overflow, is:
# Determine the height of both trees. Assume the right tree is taller. (The
other case is symmetric.)
# Remove the max element, _x_, from the left tree.
# In the right tree, navigate left until you reach the subtree, _s_, that is no
taller than the left tree.
# Replace that subtree with a new subtree whose root is _x_, whose left subtree
is the left tree, and whose right subtree is _s_.
# Rebalance.
This is a destructive operation, so to satisfy the contract for
{{addAll(Collection)}}, we have to first copy the argument to a new tree, which
takes _O_(_n_). But the good news is we can do this in _O_(_n_) for any
collection, not just a {{TreeList}}! So now {{addAll(Collection)}} has
complexity _O_(_n_ + log _m_) for _any_ collection, regardless of whether it is
a {{TreeList}}.
To accomplish linear conversion of collections to trees, I also reimplemented
the constructor {{TreeList(Collection)}}, which previously just invoked
{{addAll}}. The algorithm for converting a sorted list to an AVL tree in
_O_(_n_) time is given
[here|http://leetcode.com/2010/11/convert-sorted-list-to-balanced-binary.html].
Here are results of some performance tests where I used {{TreeList.addAll}} to
merge two collections of size _n_ for varying _n_ (averaged over 50 runs):
||n||old||new||
|160000|0.190|0.007|
|320000|0.407|0.015|
|480000|0.615|0.108|
|640000|0.853|0.142|
|800000|1.075|0.140|
|960000|1.315|0.144|
|1120000|1.617|0.158|
|1280000|1.871|0.248|
|1440000|2.109|0.258|
|1600000|2.316|0.173|
|1760000|2.559|0.288|
|1920000|2.852|0.292|
I have tested this patch rather extensively to ensure its correctness. I
subjected it to a test battery in which I randomly applied hundreds of
thousands of arbitrary operations to various {{TreeLists}}, checking list
invariants and contents after each test, so I am pretty confident in the
correctness of this new code.
I encountered an unrelated bug in {{TreeListIterator}} which I have also fixed
as part of this patch. Use of the {{remove}} operation could, for certain tree
structures, cause the iterator to enter a bad state and return incorrect
results. I include it with this patch because it was necessary to get a unit
test to pass. (My changes to {{addAll}} changed the structure of a tree
constructed in a unit test in a way that happened to elicit the bad behavior.)
This change is minor compared with the substantial changes to {{addAll}} and
the {{TreeList(Collection)}} constructor, but let me know if you'd like me to
say more on this point.
Incidentally, I have not touched {{addAll(int, Collection)}}, the overload of
{{addAll}} that adds the contents of a collection at a specified index, rather
than at the end of the list. I believe that could be done efficiently too, but
it would require a different algorithm.
This is my first contribution, so I hope I did things right. Please be gentle!
:)
> TreeList.addAll() complexity
> ----------------------------
>
> Key: COLLECTIONS-433
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-433
> Project: Commons Collections
> Issue Type: Improvement
> Affects Versions: 3.2.1
> Reporter: Adrian Nistor
> Attachments: COLLECTIONS-433.patch
>
>
> "TreeList.addAll(Collection coll)" has a higher complexity than
> necessary when "coll" is a "TreeList" object (because "addAll" just
> adds one element at a time). This can be done in just O(N) as
> described for example here:
> http://stackoverflow.com/questions/4458489/merging-2-diferent-avl-trees
> Are there any plans to improve this?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira