[jira] [Commented] (CASSANDRA-17240) CEP-19: Trie memtable implementation

Alex Petrov (Jira) Wed, 26 Oct 2022 03:59:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624364#comment-17624364
 ]


Alex Petrov commented on CASSANDRA-17240:
-----------------------------------------

{quote}I don't think that attributing desires with expressions such as "you 
personally or members of your team" helps us in anything but creating conflict.
{quote}
Re-reading it now, my wording might have been suboptimal, so let me try to 
rephrase. What I meant was Harry adoption might have been seen as unnecessary, 
or there might be something that was preventing its adoption. There was no 
attempt of attributing desires, rather the opposite - stating there is no 
(visible) inclination for adoption. This was an unnecessary assumption on my 
part. Regardless, there definitely was no bad intention in what I was 
attempting to convey, on contrary - I've offered my help with Harry tests 
previously, and have repeated it in the last paragraph.
{quote}Are those tests publicly available?
{quote}
Since some of them are Transactional-Metadata specific, I haven't posted them 
just yet. I am working to make them available on trunk, which requires some 
minor changes to them, alongside with a simple, two-command stress-like tool 
with validation abilities.
{quote}However, eight months have passed and I can't find a single class 
extending that FuzzTestBase
{quote}
Since I was mostly working on Transactional Metadata all that time, my 
intention for pushing 16262 out was to help folks working on SAI to adopt it, 
as it was discussed in cassandra-sai slack channel. But most of the actual 
fuzz-testing was as simple as creating clusters and running Harry with 
different schemas and workloads. 
{quote}CASSANDRA-16262 was meant to add fuzz testing for coordination and 
replication. We had it as a blocker for 4.0 for some time, but we finally 
released without it.
{quote}
The code that got merged into Cassandra tree was intended to enable people to 
write new tests, such as bootstrap/decom, and others, in-tree. Fuzz testing 
itself was done by running Harry with different configurations against 
Cassandra clusters. Even though most of these tests did not run on Apache 
infrastructure, all issues found by it were published, and stability of 4.0 
can, in part, be attributed to it, since several issues in the storage engine 
might have not been triggered without it, or would've been harder to find. Even 
[Scylladb|https://github.com/apache/cassandra-harry/blob/trunk/scylla-usage.md] 
folks are using Harry for validation of foundational functionality. So saying 
that we have released 4.0 without it is a bit unfair.
{quote}Marking any new features as experimental until they are tested with 
Harry is mostly equivalent to force people to use it, isn't it?
{quote}
This heavily depends on the feature. With SAI - we would have to write several 
new models. I can certainly help to write them; we can collaborate on what's to 
be tested, and we find the best way to model SAI, which is not that hard. With 
features like memtables - again, I'd say only having a bake test and lengthy 
read/write workload would give us quite a bit of confidence already. 

Besides, I'm not saying it absolutely has to be Harry, I did mention 
"equivalent rigour" in my previous message: it could be any property-based 
model-supported integration testing tool, that tests the feature not in 
isolation, but tests database behaviour with this feature assumed. Since Harry 
is already available, I just think it makes sense to use it.
{quote}Maybe I'm missing some public, community-owned repo containing a 
gazillion tests using Harry.
{quote}
Thing is, using Harry for testing is really as easy as calling 
{{visitor.visit();}} in the loop, followed by {{model.validate();}} with any 
additional calls such as streaming, etc, you would like to do, in-between. And 
this test is, in itself, a gazillion tests, since {{validate}} tests paging, 
single partition reads, reverse reads, slices, ranges, and so on, while 
{{visit}} tests partition deletions, range tombstones, etc. 

In case of trie-based memtables, I think we just need to run bake tests for 
several hundred (or more) cluster-hours (which is of course parallelizable), 
and make sure we trigger conditions such as sstable/memtable merges, range 
tombstones, partition deletions, etc.

In order to test SAI we will actually require some new models, and same with 
transactions and transactional metadata, but for features as foundational as 
memtables you don't even need to write any new code.
{quote}Those wanting to use them just used it, and that led others by example. 
I'd suggest the same approach for Harry.
{quote}
Fair enough. My impression was that allowing people to _use_ the code would be 
something that would enable them to test it, but you're right maybe it didn't 
go far enough. I've put together a simple bake-tests for trie-based (or any 
other type of) memtable that I'll do my best to publish soon.

> CEP-19: Trie memtable implementation
> ------------------------------------
>
>                 Key: CASSANDRA-17240
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17240
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Memtable
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>            Priority: Normal
>             Fix For: 4.2
>
>         Attachments: SkipListMemtable-OSS.png, TrieMemtable-OSS.png, 
> density_SG.html.gz, density_test_with_sharding.html.gz, latency-1_1-95.png, 
> latency-9_1-95.png, throughput_SG.png, throughput_apache.png
>
>          Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Trie-based memtable implementation as described in CEP-19, built on top of 
> CASSANDRA-17034 and CASSANDRA-6936.
> The implementation is available in this 
> [branch|https://github.com/blambov/cassandra/tree/CASSANDRA-17240].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17240) CEP-19: Trie memtable implementation

Reply via email to