[ https://issues.apache.org/jira/browse/CASSANDRA-15510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Branimir Lambov updated CASSANDRA-15510: ---------------------------------------- Reviewers: Benjamin Lerer, Branimir Lambov (was: Benjamin Lerer) Status: Review In Progress (was: Patch Available) > BTree: Improve Building, Inserting and Transforming > --------------------------------------------------- > > Key: CASSANDRA-15510 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15510 > Project: Cassandra > Issue Type: Improvement > Components: Local/Other > Reporter: Benedict Elliott Smith > Assignee: Benedict Elliott Smith > Priority: Normal > Fix For: 4.0.x, 4.x > > > This work was originally undertaken as a follow-up to CASSANDRA-15367 to > ensure performance is strictly improved, but it may no longer be needed for > that purpose. It’s still hugely impactful, however. It remains to be > decided where this should land. > The current {{BTree}} implementation is suboptimal in a number of ways, with > very little focus having been given to its performance besides its > memory-occupancy. This patch aims to address that, specifically improving > the performance and allocations involved in: building, transforming and > inserting into a tree. > To facilitate this work, the {{BTree}} definition is modified slightly, so > that we can perform some simple arithmetic on tree sizes. Specifically, > trees of depth n are defined to have a maximum capacity of {{branchFactor^n - > 1}}, which translates into capping the number of leaf children at > {{branchFactor-1}}, as opposed to {{branchFactor}}. Since {{branchFactor}} > is a power of 2, this permits fast tree size arithmetic, enabling some of > these changes. > h2. Building > The static build method has been modified to utilise dedicated > {{buildPerfect}} methods that build either perfectly dense or perfectly > sparse sub-trees. These perfect trees all share their {{sizeMap}} with each > other, and can be built more efficiently than trees of arbitrary size. The > specifics are described in detail in the comments, but this building block > can be used to construct trees of any size, using at most one child at each > level that is not either perfectly sparse or perfectly dense. Bulk methods > are used where possible. > For large trees this can produce up to 30x throughput improvement and 30% > allocation reduction vs 3.0 (TBC, and to be tested vs 4.0). > {{FastBuilder}} is introduced for building a tree in-order (or in reverse) > without duplicate elements to resolve, without necessarily knowing the size > upfront. This meets the needs of most use cases. Data is built directly > into nodes, with up to one already-constructed node, and one partially > constructed node, on each level, being mutated to share their contents in the > event of insufficient data to populate the tree. These builders are > thread-locally shared. These leads to minimal copying, the same sharing of > {{sizeMap}} as above, zero wasted allocations, and results in minimal > difference in performance between utilising the less-ergonomic static build > and builder approach. > For large trees this leads to ~4.5x throughput improvement, and 70% reduction > in allocations vs a normal Builder. For small trees performance is > comparable, but allocations similarly reduced. > h2. Inserting > It turns out that we only ever insert another tree into a tree, so we exploit > this to implement an efficient union of two trees, operating on them directly > via stacks in the transformer, instead of via a collection interface. A > builder-like object is introduced that shares functionality with > {{FastBuilder}}, and permits us to build the result of the union directly > into the final nodes, reusing as much of the original trees as possible. > Bulk methods are used where possible. > The result is not _uniformly_ faster, but is _significantly_ faster on > average: median _improvement_ of 1.4x (that is, 2.4x total throughput), mean > improvement of 10x. Worst reduction is 30%, and it may be that we can > isolate and alleviate that. Allocations are also reduced significantly, with > a median of 30% and mean of 42% for the tested workloads. As the trees get > larger the improvement drops, but remains uniformly lower. > h2. Transforming > Transformations garbage overhead is minimal, i.e. the main allocations are > those necessary to represent the new tree. It is significantly faster and > particularly more efficient when removing elements, utilising the shared > functionality of the builder and transformer objects to define an efficient > builder that reuses as much of the original tree as possible. > We also introduce dedicated {{transform}} methods (that forbid returning > {{null}}), and {{BiFunction}} transformations to permit efficient follow-ups. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org