[jira] [Commented] (CASSANDRA-15510) BTree: Improve Building, Inserting and Transforming

2022-02-19 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494985#comment-17494985
 ] 

Benjamin Lerer commented on CASSANDRA-15510:


[PR|https://github.com/apache/cassandra/pull/1462] for the 4.0 branch. The JMH 
tests still need to be ported
CI result can be found 
[here|https://app.circleci.com/pipelines/github/blerer/cassandra/267/workflows/99bae171-a3ea-4065-8825-4648de507741]


> BTree: Improve Building, Inserting and Transforming
> ---
>
> Key: CASSANDRA-15510
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15510
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Other
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0.x, 4.x
>
>
> This work was originally undertaken as a follow-up to CASSANDRA-15367 to 
> ensure performance is strictly improved, but it may no longer be needed for 
> that purpose.  It’s still hugely impactful, however.  It remains to be 
> decided where this should land.
> The current {{BTree}} implementation is suboptimal in a number of ways, with 
> very little focus having been given to its performance besides its 
> memory-occupancy.  This patch aims to address that, specifically improving 
> the performance and allocations involved in: building, transforming and 
> inserting into a tree.
> To facilitate this work, the {{BTree}} definition is modified slightly, so 
> that we can perform some simple arithmetic on tree sizes.  Specifically, 
> trees of depth n are defined to have a maximum capacity of {{branchFactor^n - 
> 1}}, which translates into capping the number of leaf children at 
> {{branchFactor-1}}, as opposed to {{branchFactor}}.  Since {{branchFactor}} 
> is a power of 2, this permits fast tree size arithmetic, enabling some of 
> these changes.
> h2. Building
> The static build method has been modified to utilise dedicated 
> {{buildPerfect}} methods that build either perfectly dense or perfectly 
> sparse sub-trees.  These perfect trees all share their {{sizeMap}} with each 
> other, and can be built more efficiently than trees of arbitrary size.  The 
> specifics are described in detail in the comments, but this building block 
> can be used to construct trees of any size, using at most one child at each 
> level that is not either perfectly sparse or perfectly dense.  Bulk methods 
> are used where possible.
> For large trees this can produce up to 30x throughput improvement and 30% 
> allocation reduction vs 3.0 (TBC, and to be tested vs 4.0).
> {{FastBuilder}} is introduced for building a tree in-order (or in reverse) 
> without duplicate elements to resolve, without necessarily knowing the size 
> upfront.  This meets the needs of most use cases.  Data is built directly 
> into nodes, with up to one already-constructed node, and one partially 
> constructed node, on each level, being mutated to share their contents in the 
> event of insufficient data to populate the tree.  These builders are 
> thread-locally shared.  These leads to minimal copying, the same sharing of 
> {{sizeMap}} as above, zero wasted allocations, and results in minimal 
> difference in performance between utilising the less-ergonomic static build 
> and builder approach.
> For large trees this leads to ~4.5x throughput improvement, and 70% reduction 
> in allocations vs a normal Builder.  For small trees performance is 
> comparable, but allocations similarly reduced.
> h2. Inserting
> It turns out that we only ever insert another tree into a tree, so we exploit 
> this to implement an efficient union of two trees, operating on them directly 
> via stacks in the transformer, instead of via a collection interface.  A 
> builder-like object is introduced that shares functionality with 
> {{FastBuilder}}, and permits us to build the result of the union directly 
> into the final nodes, reusing as much of the original trees as possible.  
> Bulk methods are used where possible.
> The result is not _uniformly_ faster, but is _significantly_ faster on 
> average: median _improvement_ of 1.4x (that is, 2.4x total throughput), mean 
> improvement of 10x.  Worst reduction is 30%, and it may be that we can 
> isolate and alleviate that.  Allocations are also reduced significantly, with 
> a median of 30% and mean of 42% for the tested workloads.  As the trees get 
> larger the improvement drops, but remains uniformly lower.
> h2. Transforming
> Transformations garbage overhead is minimal, i.e. the main allocations are 
> those necessary to represent the new tree.  It is significantly faster and 
> particularly more efficient when removing elements, utilising the shared 
> functionality of the builder and transformer objects to define an

[jira] [Updated] (CASSANDRA-15510) BTree: Improve Building, Inserting and Transforming

2022-02-19 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15510:
---
Status: Patch Available  (was: In Progress)

> BTree: Improve Building, Inserting and Transforming
> ---
>
> Key: CASSANDRA-15510
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15510
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Other
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0.x, 4.x
>
>
> This work was originally undertaken as a follow-up to CASSANDRA-15367 to 
> ensure performance is strictly improved, but it may no longer be needed for 
> that purpose.  It’s still hugely impactful, however.  It remains to be 
> decided where this should land.
> The current {{BTree}} implementation is suboptimal in a number of ways, with 
> very little focus having been given to its performance besides its 
> memory-occupancy.  This patch aims to address that, specifically improving 
> the performance and allocations involved in: building, transforming and 
> inserting into a tree.
> To facilitate this work, the {{BTree}} definition is modified slightly, so 
> that we can perform some simple arithmetic on tree sizes.  Specifically, 
> trees of depth n are defined to have a maximum capacity of {{branchFactor^n - 
> 1}}, which translates into capping the number of leaf children at 
> {{branchFactor-1}}, as opposed to {{branchFactor}}.  Since {{branchFactor}} 
> is a power of 2, this permits fast tree size arithmetic, enabling some of 
> these changes.
> h2. Building
> The static build method has been modified to utilise dedicated 
> {{buildPerfect}} methods that build either perfectly dense or perfectly 
> sparse sub-trees.  These perfect trees all share their {{sizeMap}} with each 
> other, and can be built more efficiently than trees of arbitrary size.  The 
> specifics are described in detail in the comments, but this building block 
> can be used to construct trees of any size, using at most one child at each 
> level that is not either perfectly sparse or perfectly dense.  Bulk methods 
> are used where possible.
> For large trees this can produce up to 30x throughput improvement and 30% 
> allocation reduction vs 3.0 (TBC, and to be tested vs 4.0).
> {{FastBuilder}} is introduced for building a tree in-order (or in reverse) 
> without duplicate elements to resolve, without necessarily knowing the size 
> upfront.  This meets the needs of most use cases.  Data is built directly 
> into nodes, with up to one already-constructed node, and one partially 
> constructed node, on each level, being mutated to share their contents in the 
> event of insufficient data to populate the tree.  These builders are 
> thread-locally shared.  These leads to minimal copying, the same sharing of 
> {{sizeMap}} as above, zero wasted allocations, and results in minimal 
> difference in performance between utilising the less-ergonomic static build 
> and builder approach.
> For large trees this leads to ~4.5x throughput improvement, and 70% reduction 
> in allocations vs a normal Builder.  For small trees performance is 
> comparable, but allocations similarly reduced.
> h2. Inserting
> It turns out that we only ever insert another tree into a tree, so we exploit 
> this to implement an efficient union of two trees, operating on them directly 
> via stacks in the transformer, instead of via a collection interface.  A 
> builder-like object is introduced that shares functionality with 
> {{FastBuilder}}, and permits us to build the result of the union directly 
> into the final nodes, reusing as much of the original trees as possible.  
> Bulk methods are used where possible.
> The result is not _uniformly_ faster, but is _significantly_ faster on 
> average: median _improvement_ of 1.4x (that is, 2.4x total throughput), mean 
> improvement of 10x.  Worst reduction is 30%, and it may be that we can 
> isolate and alleviate that.  Allocations are also reduced significantly, with 
> a median of 30% and mean of 42% for the tested workloads.  As the trees get 
> larger the improvement drops, but remains uniformly lower.
> h2. Transforming
> Transformations garbage overhead is minimal, i.e. the main allocations are 
> those necessary to represent the new tree.  It is significantly faster and 
> particularly more efficient when removing elements, utilising the shared 
> functionality of the builder and transformer objects to define an efficient 
> builder that reuses as much of the original tree as possible. 
> We also introduce dedicated {{transform}} methods (that forbid returning 
> {{null}}), and {{BiFunction}} transformations to permit efficient follow-ups.



--
This message was sent b

[jira] [Commented] (CASSANDRA-17277) Automate updating tickets with CI results after merge

2022-02-19 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494980#comment-17494980
 ] 

Michael Semb Wever commented on CASSANDRA-17277:


bq. Notably, the butler run for 940 doesn't show the TestArchiveCommitlog 
failure, however in Jenkins if you go to all test failure results for the build 
you see it there

I don't get what the intent of this sentence is… 

Butler seems to store only 16 builds, and ci-cassandra.a.o stores only 30 
builds.

bq. The above output is created by taking the passed in build number and 
walking back (and locally caching) the results of all valid previous builds. 
This local data is kept in a simple JSON file per branch which we then use for 
subsequent calculation of the # of runs we've seen a given test in and the # of 
failures; …

This then becomes much more than a cache, it's an important store of history. 
We have flakies that are 1:100, well beyond both butlers and ci-cassandra.a.o 
history. Also… the challenge that each jenkins agent needs to store its own 
cache (and access to it concurrent-safe by both executors) exacerbates because 
jobs are sticky to an agent and only get spawned on a new agent when saturation 
requires it. That is, every (e.g.) 40 builds could run on a new agent where the 
cache needs to be reconstructed but because it cannot fetch anything older than 
30 builds that and subsequent commits are going to get associated with past 
flakies…  

> Automate updating tickets with CI results after merge
> -
>
> Key: CASSANDRA-17277
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17277
> Project: Cassandra
>  Issue Type: Task
>  Components: Build, CI
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: Normal
>
> From [the wiki 
> article|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199530280]
> {quote}ci-cassandra run bot updating ticket in JIRA w/state of test run for 
> the SHA (JIRA Pending; to be linked)
> {quote}
> We already run CI for every commit (see 
> [example|https://ci-cassandra.apache.org/job/Cassandra-trunk/]). The goal is 
> to have automation that'll tie the results of a commit to the original JIRA 
> ticket and update it automatically w/a comment indicating:
>  * The results of the CI run (duration, pass, fail, # failures)
>  * Any potential regressions in CI from the merge w/links to job history 
> ([example|https://ci-cassandra.apache.org/job/Cassandra-trunk/912/testReport/junit/dtest.cqlsh_tests.test_cqlsh_copy/TestCqlshCopy/test_bulk_round_trip_with_timeouts/history/])
> From a workflow perspective we want to optimize for having as minimal 
> friction as possible to see the impact of one's commit on ci-cassandra and 
> rapidly verify if their change appears to have destabilized any tests on that 
> infrastructure.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17293) Update python test framework from nose to pytest

2022-02-19 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494969#comment-17494969
 ] 

Brandon Williams commented on CASSANDRA-17293:
--

I am out for a week but am +1 here if CI looks good.

> Update python test framework from nose to pytest
> 
>
> Key: CASSANDRA-17293
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17293
> Project: Cassandra
>  Issue Type: Task
>  Components: CQL/Interpreter
>Reporter: Brad Schoening
>Assignee: Brad Schoening
>Priority: Normal
> Fix For: 4.x
>
>
> I had trouble trying to install and run the python nose test from pip 
> (nosetest not found).
> According to the homepage of nose at [https://nose.readthedocs.io/en/latest/]
> h1. _Note to Users_
> _Nose has been in maintenance mode for the past several years and will likely 
> cease without a new person/team to take over maintainership. New projects 
> should consider using [Nose2|https://github.com/nose-devs/nose2], 
> [py.test|http://pytest.org/], or just plain unittest/unittest2._
>  
> Upgrading to pytest is likely the least effort. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-17338) Fix flaky test - test_cqlsh_completion.TestCqlshCompletion

2022-02-19 Thread Aleksei Zotov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksei Zotov updated CASSANDRA-17338:
--
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/d17b16c9b1bf3e325d415b3777c2a7bd24c764ce
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Merged as 
[https://github.com/apache/cassandra/commit/d17b16c9b1bf3e325d415b3777c2a7bd24c764ce.]
 Thanks for quick review!

> Fix flaky test - test_cqlsh_completion.TestCqlshCompletion
> --
>
> Key: CASSANDRA-17338
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17338
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: Brandon Williams
>Assignee: Aleksei Zotov
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
>  Failed 4 times in the last 24 runs. Flakiness: 30%, Stability: 83%
> A bunch of the test_completion_* tests fail occasionally with an eyebleed 
> inducing mismatched output.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-17338) Fix flaky test - test_cqlsh_completion.TestCqlshCompletion

2022-02-19 Thread Aleksei Zotov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksei Zotov updated CASSANDRA-17338:
--
Fix Version/s: 3.0.27
   3.11.13
   (was: 3.0.x)
   (was: 3.11.x)

> Fix flaky test - test_cqlsh_completion.TestCqlshCompletion
> --
>
> Key: CASSANDRA-17338
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17338
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: Brandon Williams
>Assignee: Aleksei Zotov
>Priority: Normal
> Fix For: 3.0.27, 3.11.13
>
>
>  Failed 4 times in the last 24 runs. Flakiness: 30%, Stability: 83%
> A bunch of the test_completion_* tests fail occasionally with an eyebleed 
> inducing mismatched output.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch cassandra-3.0 updated (679740f -> d17b16c)

2022-02-19 Thread azotcsit
This is an automated email from the ASF dual-hosted git repository.

azotcsit pushed a change to branch cassandra-3.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 679740f  Added CVE-2021-44521 to CHANGES.txt, NEWS.txt
 add d17b16c  Fix flaky test - test_cqlsh_completion.TestCqlshCompletion

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt  |  1 +
 pylib/cqlshlib/test/run_cqlsh.py |  4 ++--
 pylib/cqlshlib/test/test_cqlsh_completion.py | 19 ---
 3 files changed, 19 insertions(+), 5 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch cassandra-4.0 updated (85fd49f -> 5bc9f7c)

2022-02-19 Thread azotcsit
This is an automated email from the ASF dual-hosted git repository.

azotcsit pushed a change to branch cassandra-4.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 85fd49f  Merge branch 'cassandra-3.11' into cassandra-4.0
 add d17b16c  Fix flaky test - test_cqlsh_completion.TestCqlshCompletion
 add c3d51a8  Merge branch 'cassandra-3.0' into cassandra-3.11
 add 5bc9f7c  Merge branch 'cassandra-3.11' into cassandra-4.0

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch cassandra-3.11 updated (593872c -> c3d51a8)

2022-02-19 Thread azotcsit
This is an automated email from the ASF dual-hosted git repository.

azotcsit pushed a change to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 593872c  Merge branch 'cassandra-3.0' into cassandra-3.11
 add d17b16c  Fix flaky test - test_cqlsh_completion.TestCqlshCompletion
 add c3d51a8  Merge branch 'cassandra-3.0' into cassandra-3.11

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt  |  3 ++-
 pylib/cqlshlib/test/run_cqlsh.py |  4 ++--
 pylib/cqlshlib/test/test_cqlsh_completion.py | 19 ---
 3 files changed, 20 insertions(+), 6 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] 01/01: Merge branch 'cassandra-4.0' into trunk

2022-02-19 Thread azotcsit
This is an automated email from the ASF dual-hosted git repository.

azotcsit pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit d96c32b0b36d732df898f1d1a732c9398c2d775b
Merge: c08baf2 5bc9f7c
Author: Aleksei Zotov 
AuthorDate: Sat Feb 19 13:39:52 2022 +0400

Merge branch 'cassandra-4.0' into trunk

 CHANGES.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --cc CHANGES.txt
index 9bbb57e,9378003..ab0ff4d
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -138,17 -46,7 +138,18 @@@ Merged from 3.11
   * Add key validation to ssstablescrub (CASSANDRA-16969)
   * Update Jackson from 2.9.10 to 2.12.5 (CASSANDRA-16851)
   * Make assassinate more resilient to missing tokens (CASSANDRA-16847)
 + * Validate SASI tokenizer options before adding index to schema 
(CASSANDRA-15135)
 + * Fixup scrub output when no data post-scrub and clear up old use of row, 
which really means partition (CASSANDRA-16835)
 + * Reduce thread contention in CommitLogSegment and HintsBuffer 
(CASSANDRA-16072)
 + * Make cqlsh use the same set of reserved keywords than the server uses 
(CASSANDRA-15663)
 + * Optimize bytes skipping when reading SSTable files (CASSANDRA-14415)
 + * Enable tombstone compactions when unchecked_tombstone_compaction is set in 
TWCS (CASSANDRA-14496)
 + * Read only the required SSTables for single partition queries 
(CASSANDRA-16737)
  Merged from 3.0:
++ * Fix flaky test - test_cqlsh_completion.TestCqlshCompletion 
(CASSANDRA-17338)
 + * Fixed TestCqlshOutput failing tests (CASSANDRA-17386)
 + * Lazy transaction log replica creation allows incorrect replica content 
divergence during anticompaction (CASSANDRA-17273)
 + * LeveledCompactionStrategy disk space check improvements (CASSANDRA-17272)
   * Fix conversion from megabits to bytes in streaming rate limiter 
(CASSANDRA-17243)
   * Upgrade logback to 1.2.9 (CASSANDRA-17204)
   * Avoid race in AbstractReplicationStrategy endpoint caching 
(CASSANDRA-16673)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated (c08baf2 -> d96c32b)

2022-02-19 Thread azotcsit
This is an automated email from the ASF dual-hosted git repository.

azotcsit pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from c08baf2  ninja-fix CHANGES.txt so >4.0.0 entries are "Merged in …" 
entries
 add d17b16c  Fix flaky test - test_cqlsh_completion.TestCqlshCompletion
 add c3d51a8  Merge branch 'cassandra-3.0' into cassandra-3.11
 add 5bc9f7c  Merge branch 'cassandra-3.11' into cassandra-4.0
 new d96c32b  Merge branch 'cassandra-4.0' into trunk

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.txt | 2 ++
 1 file changed, 2 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org