[jira] [Commented] (CASSANDRA-15510) BTree: Improve Building, Inserting and Transforming
[ https://issues.apache.org/jira/browse/CASSANDRA-15510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494985#comment-17494985 ] Benjamin Lerer commented on CASSANDRA-15510: [PR|https://github.com/apache/cassandra/pull/1462] for the 4.0 branch. The JMH tests still need to be ported CI result can be found [here|https://app.circleci.com/pipelines/github/blerer/cassandra/267/workflows/99bae171-a3ea-4065-8825-4648de507741] > BTree: Improve Building, Inserting and Transforming > --- > > Key: CASSANDRA-15510 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15510 > Project: Cassandra > Issue Type: Improvement > Components: Local/Other >Reporter: Benedict Elliott Smith >Assignee: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0.x, 4.x > > > This work was originally undertaken as a follow-up to CASSANDRA-15367 to > ensure performance is strictly improved, but it may no longer be needed for > that purpose. It’s still hugely impactful, however. It remains to be > decided where this should land. > The current {{BTree}} implementation is suboptimal in a number of ways, with > very little focus having been given to its performance besides its > memory-occupancy. This patch aims to address that, specifically improving > the performance and allocations involved in: building, transforming and > inserting into a tree. > To facilitate this work, the {{BTree}} definition is modified slightly, so > that we can perform some simple arithmetic on tree sizes. Specifically, > trees of depth n are defined to have a maximum capacity of {{branchFactor^n - > 1}}, which translates into capping the number of leaf children at > {{branchFactor-1}}, as opposed to {{branchFactor}}. Since {{branchFactor}} > is a power of 2, this permits fast tree size arithmetic, enabling some of > these changes. > h2. Building > The static build method has been modified to utilise dedicated > {{buildPerfect}} methods that build either perfectly dense or perfectly > sparse sub-trees. These perfect trees all share their {{sizeMap}} with each > other, and can be built more efficiently than trees of arbitrary size. The > specifics are described in detail in the comments, but this building block > can be used to construct trees of any size, using at most one child at each > level that is not either perfectly sparse or perfectly dense. Bulk methods > are used where possible. > For large trees this can produce up to 30x throughput improvement and 30% > allocation reduction vs 3.0 (TBC, and to be tested vs 4.0). > {{FastBuilder}} is introduced for building a tree in-order (or in reverse) > without duplicate elements to resolve, without necessarily knowing the size > upfront. This meets the needs of most use cases. Data is built directly > into nodes, with up to one already-constructed node, and one partially > constructed node, on each level, being mutated to share their contents in the > event of insufficient data to populate the tree. These builders are > thread-locally shared. These leads to minimal copying, the same sharing of > {{sizeMap}} as above, zero wasted allocations, and results in minimal > difference in performance between utilising the less-ergonomic static build > and builder approach. > For large trees this leads to ~4.5x throughput improvement, and 70% reduction > in allocations vs a normal Builder. For small trees performance is > comparable, but allocations similarly reduced. > h2. Inserting > It turns out that we only ever insert another tree into a tree, so we exploit > this to implement an efficient union of two trees, operating on them directly > via stacks in the transformer, instead of via a collection interface. A > builder-like object is introduced that shares functionality with > {{FastBuilder}}, and permits us to build the result of the union directly > into the final nodes, reusing as much of the original trees as possible. > Bulk methods are used where possible. > The result is not _uniformly_ faster, but is _significantly_ faster on > average: median _improvement_ of 1.4x (that is, 2.4x total throughput), mean > improvement of 10x. Worst reduction is 30%, and it may be that we can > isolate and alleviate that. Allocations are also reduced significantly, with > a median of 30% and mean of 42% for the tested workloads. As the trees get > larger the improvement drops, but remains uniformly lower. > h2. Transforming > Transformations garbage overhead is minimal, i.e. the main allocations are > those necessary to represent the new tree. It is significantly faster and > particularly more efficient when removing elements, utilising the shared > functionality of the builder and transformer objects to define an
[jira] [Updated] (CASSANDRA-15510) BTree: Improve Building, Inserting and Transforming
[ https://issues.apache.org/jira/browse/CASSANDRA-15510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15510: --- Status: Patch Available (was: In Progress) > BTree: Improve Building, Inserting and Transforming > --- > > Key: CASSANDRA-15510 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15510 > Project: Cassandra > Issue Type: Improvement > Components: Local/Other >Reporter: Benedict Elliott Smith >Assignee: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0.x, 4.x > > > This work was originally undertaken as a follow-up to CASSANDRA-15367 to > ensure performance is strictly improved, but it may no longer be needed for > that purpose. It’s still hugely impactful, however. It remains to be > decided where this should land. > The current {{BTree}} implementation is suboptimal in a number of ways, with > very little focus having been given to its performance besides its > memory-occupancy. This patch aims to address that, specifically improving > the performance and allocations involved in: building, transforming and > inserting into a tree. > To facilitate this work, the {{BTree}} definition is modified slightly, so > that we can perform some simple arithmetic on tree sizes. Specifically, > trees of depth n are defined to have a maximum capacity of {{branchFactor^n - > 1}}, which translates into capping the number of leaf children at > {{branchFactor-1}}, as opposed to {{branchFactor}}. Since {{branchFactor}} > is a power of 2, this permits fast tree size arithmetic, enabling some of > these changes. > h2. Building > The static build method has been modified to utilise dedicated > {{buildPerfect}} methods that build either perfectly dense or perfectly > sparse sub-trees. These perfect trees all share their {{sizeMap}} with each > other, and can be built more efficiently than trees of arbitrary size. The > specifics are described in detail in the comments, but this building block > can be used to construct trees of any size, using at most one child at each > level that is not either perfectly sparse or perfectly dense. Bulk methods > are used where possible. > For large trees this can produce up to 30x throughput improvement and 30% > allocation reduction vs 3.0 (TBC, and to be tested vs 4.0). > {{FastBuilder}} is introduced for building a tree in-order (or in reverse) > without duplicate elements to resolve, without necessarily knowing the size > upfront. This meets the needs of most use cases. Data is built directly > into nodes, with up to one already-constructed node, and one partially > constructed node, on each level, being mutated to share their contents in the > event of insufficient data to populate the tree. These builders are > thread-locally shared. These leads to minimal copying, the same sharing of > {{sizeMap}} as above, zero wasted allocations, and results in minimal > difference in performance between utilising the less-ergonomic static build > and builder approach. > For large trees this leads to ~4.5x throughput improvement, and 70% reduction > in allocations vs a normal Builder. For small trees performance is > comparable, but allocations similarly reduced. > h2. Inserting > It turns out that we only ever insert another tree into a tree, so we exploit > this to implement an efficient union of two trees, operating on them directly > via stacks in the transformer, instead of via a collection interface. A > builder-like object is introduced that shares functionality with > {{FastBuilder}}, and permits us to build the result of the union directly > into the final nodes, reusing as much of the original trees as possible. > Bulk methods are used where possible. > The result is not _uniformly_ faster, but is _significantly_ faster on > average: median _improvement_ of 1.4x (that is, 2.4x total throughput), mean > improvement of 10x. Worst reduction is 30%, and it may be that we can > isolate and alleviate that. Allocations are also reduced significantly, with > a median of 30% and mean of 42% for the tested workloads. As the trees get > larger the improvement drops, but remains uniformly lower. > h2. Transforming > Transformations garbage overhead is minimal, i.e. the main allocations are > those necessary to represent the new tree. It is significantly faster and > particularly more efficient when removing elements, utilising the shared > functionality of the builder and transformer objects to define an efficient > builder that reuses as much of the original tree as possible. > We also introduce dedicated {{transform}} methods (that forbid returning > {{null}}), and {{BiFunction}} transformations to permit efficient follow-ups. -- This message was sent b
[jira] [Commented] (CASSANDRA-17277) Automate updating tickets with CI results after merge
[ https://issues.apache.org/jira/browse/CASSANDRA-17277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494980#comment-17494980 ] Michael Semb Wever commented on CASSANDRA-17277: bq. Notably, the butler run for 940 doesn't show the TestArchiveCommitlog failure, however in Jenkins if you go to all test failure results for the build you see it there I don't get what the intent of this sentence is… Butler seems to store only 16 builds, and ci-cassandra.a.o stores only 30 builds. bq. The above output is created by taking the passed in build number and walking back (and locally caching) the results of all valid previous builds. This local data is kept in a simple JSON file per branch which we then use for subsequent calculation of the # of runs we've seen a given test in and the # of failures; … This then becomes much more than a cache, it's an important store of history. We have flakies that are 1:100, well beyond both butlers and ci-cassandra.a.o history. Also… the challenge that each jenkins agent needs to store its own cache (and access to it concurrent-safe by both executors) exacerbates because jobs are sticky to an agent and only get spawned on a new agent when saturation requires it. That is, every (e.g.) 40 builds could run on a new agent where the cache needs to be reconstructed but because it cannot fetch anything older than 30 builds that and subsequent commits are going to get associated with past flakies… > Automate updating tickets with CI results after merge > - > > Key: CASSANDRA-17277 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17277 > Project: Cassandra > Issue Type: Task > Components: Build, CI >Reporter: Josh McKenzie >Assignee: Josh McKenzie >Priority: Normal > > From [the wiki > article|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199530280] > {quote}ci-cassandra run bot updating ticket in JIRA w/state of test run for > the SHA (JIRA Pending; to be linked) > {quote} > We already run CI for every commit (see > [example|https://ci-cassandra.apache.org/job/Cassandra-trunk/]). The goal is > to have automation that'll tie the results of a commit to the original JIRA > ticket and update it automatically w/a comment indicating: > * The results of the CI run (duration, pass, fail, # failures) > * Any potential regressions in CI from the merge w/links to job history > ([example|https://ci-cassandra.apache.org/job/Cassandra-trunk/912/testReport/junit/dtest.cqlsh_tests.test_cqlsh_copy/TestCqlshCopy/test_bulk_round_trip_with_timeouts/history/]) > From a workflow perspective we want to optimize for having as minimal > friction as possible to see the impact of one's commit on ci-cassandra and > rapidly verify if their change appears to have destabilized any tests on that > infrastructure. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17293) Update python test framework from nose to pytest
[ https://issues.apache.org/jira/browse/CASSANDRA-17293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494969#comment-17494969 ] Brandon Williams commented on CASSANDRA-17293: -- I am out for a week but am +1 here if CI looks good. > Update python test framework from nose to pytest > > > Key: CASSANDRA-17293 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17293 > Project: Cassandra > Issue Type: Task > Components: CQL/Interpreter >Reporter: Brad Schoening >Assignee: Brad Schoening >Priority: Normal > Fix For: 4.x > > > I had trouble trying to install and run the python nose test from pip > (nosetest not found). > According to the homepage of nose at [https://nose.readthedocs.io/en/latest/] > h1. _Note to Users_ > _Nose has been in maintenance mode for the past several years and will likely > cease without a new person/team to take over maintainership. New projects > should consider using [Nose2|https://github.com/nose-devs/nose2], > [py.test|http://pytest.org/], or just plain unittest/unittest2._ > > Upgrading to pytest is likely the least effort. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17338) Fix flaky test - test_cqlsh_completion.TestCqlshCompletion
[ https://issues.apache.org/jira/browse/CASSANDRA-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksei Zotov updated CASSANDRA-17338: -- Since Version: NA Source Control Link: https://github.com/apache/cassandra/commit/d17b16c9b1bf3e325d415b3777c2a7bd24c764ce Resolution: Fixed Status: Resolved (was: Ready to Commit) Merged as [https://github.com/apache/cassandra/commit/d17b16c9b1bf3e325d415b3777c2a7bd24c764ce.] Thanks for quick review! > Fix flaky test - test_cqlsh_completion.TestCqlshCompletion > -- > > Key: CASSANDRA-17338 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17338 > Project: Cassandra > Issue Type: Bug > Components: CQL/Interpreter >Reporter: Brandon Williams >Assignee: Aleksei Zotov >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > Failed 4 times in the last 24 runs. Flakiness: 30%, Stability: 83% > A bunch of the test_completion_* tests fail occasionally with an eyebleed > inducing mismatched output. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17338) Fix flaky test - test_cqlsh_completion.TestCqlshCompletion
[ https://issues.apache.org/jira/browse/CASSANDRA-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksei Zotov updated CASSANDRA-17338: -- Fix Version/s: 3.0.27 3.11.13 (was: 3.0.x) (was: 3.11.x) > Fix flaky test - test_cqlsh_completion.TestCqlshCompletion > -- > > Key: CASSANDRA-17338 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17338 > Project: Cassandra > Issue Type: Bug > Components: CQL/Interpreter >Reporter: Brandon Williams >Assignee: Aleksei Zotov >Priority: Normal > Fix For: 3.0.27, 3.11.13 > > > Failed 4 times in the last 24 runs. Flakiness: 30%, Stability: 83% > A bunch of the test_completion_* tests fail occasionally with an eyebleed > inducing mismatched output. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch cassandra-3.0 updated (679740f -> d17b16c)
This is an automated email from the ASF dual-hosted git repository. azotcsit pushed a change to branch cassandra-3.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 679740f Added CVE-2021-44521 to CHANGES.txt, NEWS.txt add d17b16c Fix flaky test - test_cqlsh_completion.TestCqlshCompletion No new revisions were added by this update. Summary of changes: CHANGES.txt | 1 + pylib/cqlshlib/test/run_cqlsh.py | 4 ++-- pylib/cqlshlib/test/test_cqlsh_completion.py | 19 --- 3 files changed, 19 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch cassandra-4.0 updated (85fd49f -> 5bc9f7c)
This is an automated email from the ASF dual-hosted git repository. azotcsit pushed a change to branch cassandra-4.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 85fd49f Merge branch 'cassandra-3.11' into cassandra-4.0 add d17b16c Fix flaky test - test_cqlsh_completion.TestCqlshCompletion add c3d51a8 Merge branch 'cassandra-3.0' into cassandra-3.11 add 5bc9f7c Merge branch 'cassandra-3.11' into cassandra-4.0 No new revisions were added by this update. Summary of changes: CHANGES.txt | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch cassandra-3.11 updated (593872c -> c3d51a8)
This is an automated email from the ASF dual-hosted git repository. azotcsit pushed a change to branch cassandra-3.11 in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 593872c Merge branch 'cassandra-3.0' into cassandra-3.11 add d17b16c Fix flaky test - test_cqlsh_completion.TestCqlshCompletion add c3d51a8 Merge branch 'cassandra-3.0' into cassandra-3.11 No new revisions were added by this update. Summary of changes: CHANGES.txt | 3 ++- pylib/cqlshlib/test/run_cqlsh.py | 4 ++-- pylib/cqlshlib/test/test_cqlsh_completion.py | 19 --- 3 files changed, 20 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] 01/01: Merge branch 'cassandra-4.0' into trunk
This is an automated email from the ASF dual-hosted git repository. azotcsit pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git commit d96c32b0b36d732df898f1d1a732c9398c2d775b Merge: c08baf2 5bc9f7c Author: Aleksei Zotov AuthorDate: Sat Feb 19 13:39:52 2022 +0400 Merge branch 'cassandra-4.0' into trunk CHANGES.txt | 2 ++ 1 file changed, 2 insertions(+) diff --cc CHANGES.txt index 9bbb57e,9378003..ab0ff4d --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -138,17 -46,7 +138,18 @@@ Merged from 3.11 * Add key validation to ssstablescrub (CASSANDRA-16969) * Update Jackson from 2.9.10 to 2.12.5 (CASSANDRA-16851) * Make assassinate more resilient to missing tokens (CASSANDRA-16847) + * Validate SASI tokenizer options before adding index to schema (CASSANDRA-15135) + * Fixup scrub output when no data post-scrub and clear up old use of row, which really means partition (CASSANDRA-16835) + * Reduce thread contention in CommitLogSegment and HintsBuffer (CASSANDRA-16072) + * Make cqlsh use the same set of reserved keywords than the server uses (CASSANDRA-15663) + * Optimize bytes skipping when reading SSTable files (CASSANDRA-14415) + * Enable tombstone compactions when unchecked_tombstone_compaction is set in TWCS (CASSANDRA-14496) + * Read only the required SSTables for single partition queries (CASSANDRA-16737) Merged from 3.0: ++ * Fix flaky test - test_cqlsh_completion.TestCqlshCompletion (CASSANDRA-17338) + * Fixed TestCqlshOutput failing tests (CASSANDRA-17386) + * Lazy transaction log replica creation allows incorrect replica content divergence during anticompaction (CASSANDRA-17273) + * LeveledCompactionStrategy disk space check improvements (CASSANDRA-17272) * Fix conversion from megabits to bytes in streaming rate limiter (CASSANDRA-17243) * Upgrade logback to 1.2.9 (CASSANDRA-17204) * Avoid race in AbstractReplicationStrategy endpoint caching (CASSANDRA-16673) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated (c08baf2 -> d96c32b)
This is an automated email from the ASF dual-hosted git repository. azotcsit pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git. from c08baf2 ninja-fix CHANGES.txt so >4.0.0 entries are "Merged in …" entries add d17b16c Fix flaky test - test_cqlsh_completion.TestCqlshCompletion add c3d51a8 Merge branch 'cassandra-3.0' into cassandra-3.11 add 5bc9f7c Merge branch 'cassandra-3.11' into cassandra-4.0 new d96c32b Merge branch 'cassandra-4.0' into trunk The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: CHANGES.txt | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org