[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945607#comment-13945607
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:
--------------------------------------------

bq. In practice, moving the WILLNEED into the getSegment() call is dangerous as 
the segment is used past the initial 64Kb, and if we rely on ourselves only for 
read-ahead this could result in very substandard performance for larger rows. 
We also probably want to only WILLNEED the actual size of the buffer we expect 
to read for compressed files.

Yes, this is only PoC to see if the scheme works for platters. Just a couple of 
things, for the optimal performance we need an information from the index about 
the size of the row, so we can mark SEQUENTIAL a). whole row if the row is less 
then indexing threshold, b). portions of the row on the index boundaries. 
Original 1 page WILLNEED (very conservative) is used to make sure that read can 
quickly grab the first portion of the buffer while extended read-ahead 
prefetches everything else. This still works for the big rows because we are 
forced to read the header of the row first (key at least) and then when we 
seek() to the position indicated by column index and we want to hint that we 
are going to read for the portion of the row, so large rows are suffering more 
from the fact that we have to over-buffer then WILLNEED. I wish we could have 
useful mmap'ed buffer implementation, so madvice as such as we do fadvice would 
no longer be required...

There is a way to solve cold cache problem from the parts of the data from 
original SSTables that have been read before, I did some work with mincore() 
previously and can revisit if needed. The problem we are trying to solve with 
dropping the cache for memtable and compacted SSTables (in memory restricted 
and/or slow I/O systems) is keeping page cache for the old files creates more 
jitter and slows down warmup of the newly created SSTable. 

 

> Reads have a slow ramp up in speed
> ----------------------------------
>
>                 Key: CASSANDRA-6746
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Ryan McGuire
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 2.1 beta2
>
>         Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
> 6746-patched.png, 6746.blockdev_setra.full.png, 
> 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
> 6746.buffered_io_tweaks.write-flush-compact-mixed.png, 
> 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, 
> buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
> cassandra-2.1-bdplab-trial-fincore.tar.bz2
>
>
> On a physical four node cluister I am doing a big write and then a big read. 
> The read takes a long time to ramp up to respectable speeds.
> !2.1_vs_2.0_read.png!
> [See data 
> here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.json&metric=interval_op_rate&operation=stress-read&smoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to