[
https://issues.apache.org/jira/browse/CASSANDRA-19987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18052700#comment-18052700
]
Sam Lightfoot edited comment on CASSANDRA-19987 at 1/18/26 1:55 PM:
--------------------------------------------------------------------
*Test Scenario:*
* 6GB cgroup memory limit, 3GB JVM heap
* 10k/s read workload against 1GB hot dataset
* Compaction: 2x 64GB SSTables (STCS)
* Duration: 10 minutes of concurrent compaction + reads
*Memory Pressure Stall Comparison (PSI metrics):*
||Metric||Direct I/O||Buffered I/O||Difference||Notes||
|Mean Stall|25.0 ms/s|32.4 ms/s|+29%| |
|p99 Stall|78.6 ms/s|90.5 ms/s|+15%| |
|Max Stall|117.6 ms/s|176.2 ms/s|+50%| |
|Total Stall Time|15.0s|19.4s|+29%| |
|Spikes >50 ms/s|15|26|+73%| |
|Active_File Std Dev|10.15 MB|24.98 MB|+146%|Active_File is page cache hot list
size|
*Conclusion:* Direct I/O for compaction reads reduces memory pressure stalls by
29% and provides 2.5x more stable page cache active list. Buffered I/O pollutes
the page cache, causing LRU churn that evicts hot data and triggers direct
reclaim stalls - directly impacting read latency.
{*}Raw data{*}: see _direct_page_cache_compaction_ and
_buffered_page_cache_compaction_
{*}Command{*}{_}:{_} cat
/sys/fs/cgroup/system.slice/cassandra-test.scope/memory.pressure
was (Author: JIRAUSER302824):
*Test Scenario:*
* 6GB cgroup memory limit, 3GB JVM heap
* 10k/s read workload against 1GB hot dataset
* Compaction: 2x 64GB SSTables (STCS)
* Duration: 10 minutes of concurrent compaction + reads
*Memory Pressure Stall Comparison (PSI metrics):*
||Metric||Direct I/O||Buffered I/O||Difference||Notes||
|Mean Stall|25,029 μs/s|32,410 μs/s|+29%| |
|p99 Stall|78,641 μs/s|90,466 μs/s|+15%| |
|Max Stall|117,595 μs/s|176,150 μs/s|+50%| |
|Total Stall Time|15.0s|19.4s|+29%| |
|Spikes >50k μs/s|15|26|+73%| |
|Active_File Std Dev|10.15 MB|24.98 MB|+146%|Active_File is page cache hot list
size|
*Conclusion:* Direct I/O for compaction reads reduces memory pressure stalls by
29% and provides 2.5x more stable page cache active list. Buffered I/O pollutes
the page cache, causing LRU churn that evicts hot data and triggers direct
reclaim stalls - directly impacting read latency.
{*}Raw data{*}: see _direct_page_cache_compaction_ and
_buffered_page_cache_compaction_
{*}Command{*}{_}:{_} cat
/sys/fs/cgroup/system.slice/cassandra-test.scope/memory.pressure
> Direct IO support for compaction reads
> --------------------------------------
>
> Key: CASSANDRA-19987
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19987
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Local/Compaction
> Reporter: Jon Haddad
> Assignee: Sam Lightfoot
> Priority: Normal
> Fix For: 5.x
>
> Attachments: buffered_during_compaction-reads_1m.txt,
> buffered_during_compaction-reads_5m.txt, buffered_page_cache_compaction,
> direct_during_compaction-reads_1m.txt, direct_during_compaction-reads_5m.txt,
> direct_page_cache_compaction, image-2025-12-24-10-20-44-947.png,
> image-2025-12-24-10-21-11-928.png, image-2025-12-24-10-34-10-834.png
>
> Time Spent: 4h
> Remaining Estimate: 0h
>
> If we use direct io to read SSTables during compaction, we can avoid
> polluting the page cache with data we're about to delete. As another side
> effect, we also evict pages to make room for whatever we're putting in. This
> unnecessary churn leads to higher CPU overhead and can cause dips in client
> read latency, as we're going to be evicting pages that could be used to serve
> those reads.
> This is most notable with STCS as the SSTables get larger, potentially
> evicting the entire hot dataset out of cache, but is affected by every
> compaction strategy.
> This is a follow up to be done after CASSANDRA-15452 since we will have an
> internal buffer.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]