[ https://issues.apache.org/jira/browse/CASSANDRA-19497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17895769#comment-17895769 ]
Caleb Rackliffe commented on CASSANDRA-19497: --------------------------------------------- Now, if we modify the test above to see what happens on the other extreme, i.e. global queries over a dataset with single-row partitions... {noformat} @Test public void test() throws Throwable { createTable("CREATE TABLE %s (k int, c int, val int, PRIMARY KEY (k, c))"); createIndex("CREATE INDEX ON %s(val) USING 'sai'"); for (int i = 0; i < 10000; i++) execute("INSERT INTO %s (k, c, val) VALUES (?, ?, ?)", i, 0, i); flush(); Histogram histogram = new Histogram(4); for (int i = 0; i < 10000; i++) { long start = System.nanoTime(); execute("SELECT k, c FROM %s WHERE val > 9000"); histogram.recordValue(System.nanoTime() - start); if (i % 1000 == 0) { System.err.println("50th: " + histogram.getValueAtPercentile(0.5)); System.err.println("95th: " + histogram.getValueAtPercentile(0.95)); System.err.println("99th: " + histogram.getValueAtPercentile(0.99)); } } } {noformat} ...we get something like this: |branch|p50 (nanos)|p99 (nanos)| |trunk|3,030,015|3,064,319| |patch w/ 100-row batches (that are actually 1-row batches)|2,934,271|2,966,911| As expected, there's no much difference. (The thread-local might avoid some {{NavigableSet}} creation in {{makeFilter()}} in the patch.) > ResultRetriever should batch clusterings/rows during SAI post-filtering reads > ----------------------------------------------------------------------------- > > Key: CASSANDRA-19497 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19497 > Project: Cassandra > Issue Type: Improvement > Components: Feature/SAI > Reporter: Caleb Rackliffe > Assignee: Caleb Rackliffe > Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: alloc-trunk.html, cpu-batch-100-19497.png, > cpu-trunk-19497.png, cpu-trunk.html, heap-flamegraph.html, > wall-no-parked-threads.html > > Time Spent: 10m > Remaining Estimate: 0h > > SAI currently creates and executes a {{SinglePartitionReadCommand}} for every > {{PrimaryKey}} the index produces to read the corresponding row for > post-filtering. Informed by the limits present in the read command itself, it > should be possible to batch those reads w/ a {{ClusteringIndexNamesFilter}} > in many fewer {{SinglePartitionReadCommands}}. When we have a handful of > matches in a large partition, this seems like would involve many fewer seeks, > less unnecessary object creation, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org