[jira] [Commented] (CASSANDRA-19497) ResultRetriever should batch clusterings/rows during SAI post-filtering reads

Caleb Rackliffe (Jira) Tue, 05 Nov 2024 11:12:09 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17895769#comment-17895769
 ]


Caleb Rackliffe commented on CASSANDRA-19497:
---------------------------------------------

Now, if we modify the test above to see what happens on the other extreme, i.e. 
global queries over a dataset with single-row partitions...
 
{noformat}
@Test
public void test() throws Throwable
{
    createTable("CREATE TABLE %s (k int, c int, val int, PRIMARY KEY (k, c))");
    createIndex("CREATE INDEX ON %s(val) USING 'sai'");

    for (int i = 0; i < 10000; i++)
        execute("INSERT INTO %s (k, c, val) VALUES (?, ?, ?)", i, 0, i);

    flush();

    Histogram histogram = new Histogram(4);

    for (int i = 0; i < 10000; i++)
    {
        long start = System.nanoTime();
        execute("SELECT k, c FROM %s WHERE val > 9000");
        histogram.recordValue(System.nanoTime() - start);

        if (i % 1000 == 0)
        {
            System.err.println("50th: " + histogram.getValueAtPercentile(0.5));
            System.err.println("95th: " + histogram.getValueAtPercentile(0.95));
            System.err.println("99th: " + histogram.getValueAtPercentile(0.99));
        }
    }
}
{noformat}

...we get something like this:

|branch|p50 (nanos)|p99 (nanos)|
|trunk|3,030,015|3,064,319|
|patch w/ 100-row batches (that are actually 1-row batches)|2,934,271|2,966,911|

As expected, there's no much difference. (The thread-local might avoid some 
{{NavigableSet}} creation in {{makeFilter()}} in the patch.)

> ResultRetriever should batch clusterings/rows during SAI post-filtering reads
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19497
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Feature/SAI
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 5.0.x, 5.x
>
>         Attachments: alloc-trunk.html, cpu-batch-100-19497.png, 
> cpu-trunk-19497.png, cpu-trunk.html, heap-flamegraph.html, 
> wall-no-parked-threads.html
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> SAI currently creates and executes a {{SinglePartitionReadCommand}} for every 
> {{PrimaryKey}} the index produces to read the corresponding row for 
> post-filtering. Informed by the limits present in the read command itself, it 
> should be possible to batch those reads w/ a {{ClusteringIndexNamesFilter}} 
> in many fewer {{SinglePartitionReadCommands}}. When we have a handful of 
> matches in a large partition, this seems like would involve many fewer seeks, 
> less unnecessary object creation, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19497) ResultRetriever should batch clusterings/rows during SAI post-filtering reads

Reply via email to