[ https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945607#comment-13945607 ]
Pavel Yaskevich commented on CASSANDRA-6746: -------------------------------------------- bq. In practice, moving the WILLNEED into the getSegment() call is dangerous as the segment is used past the initial 64Kb, and if we rely on ourselves only for read-ahead this could result in very substandard performance for larger rows. We also probably want to only WILLNEED the actual size of the buffer we expect to read for compressed files. Yes, this is only PoC to see if the scheme works for platters. Just a couple of things, for the optimal performance we need an information from the index about the size of the row, so we can mark SEQUENTIAL a). whole row if the row is less then indexing threshold, b). portions of the row on the index boundaries. Original 1 page WILLNEED (very conservative) is used to make sure that read can quickly grab the first portion of the buffer while extended read-ahead prefetches everything else. This still works for the big rows because we are forced to read the header of the row first (key at least) and then when we seek() to the position indicated by column index and we want to hint that we are going to read for the portion of the row, so large rows are suffering more from the fact that we have to over-buffer then WILLNEED. I wish we could have useful mmap'ed buffer implementation, so madvice as such as we do fadvice would no longer be required... There is a way to solve cold cache problem from the parts of the data from original SSTables that have been read before, I did some work with mincore() previously and can revisit if needed. The problem we are trying to solve with dropping the cache for memtable and compacted SSTables (in memory restricted and/or slow I/O systems) is keeping page cache for the old files creates more jitter and slows down warmup of the newly created SSTable. > Reads have a slow ramp up in speed > ---------------------------------- > > Key: CASSANDRA-6746 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6746 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Ryan McGuire > Assignee: Benedict > Labels: performance > Fix For: 2.1 beta2 > > Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, > 6746-patched.png, 6746.blockdev_setra.full.png, > 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, > 6746.buffered_io_tweaks.write-flush-compact-mixed.png, > 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, > buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, > cassandra-2.1-bdplab-trial-fincore.tar.bz2 > > > On a physical four node cluister I am doing a big write and then a big read. > The read takes a long time to ramp up to respectable speeds. > !2.1_vs_2.0_read.png! > [See data > here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.json&metric=interval_op_rate&operation=stress-read&smoothing=1] -- This message was sent by Atlassian JIRA (v6.2#6252)