[ 
https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013646#comment-13013646
 ] 

Peter Schuller commented on CASSANDRA-1902:
-------------------------------------------

Scratch the bit about mmap, I made a boo-boo grepping.

I forgot to address the seeking though: So advising larger regions might help 
depending on I/O scheduling behavior in the sense that it may make it easier 
for the kernel/storage to do sequential reading of adjacent blocks. However, my 
main concern was the fundamentally required seeking resulting from a sparsely 
hot sstables. It is not necessarily a problem as such, as long as everyone is 
aware that the seeks will be happening and that it will potentially add 
significantly to the time required to complete compaction.

The main outright danger would be if the I/O imposed by the WILLNEED is 
sufficiently detrimental to live reads to cause more problems (albeit during a 
briefer period perhaps) than the negative effects of compaction that is being 
addressed.

Exactly how it will behave I'm not sure (lack of detailed kernel knowledge), 
but if the I/O implied by WILLNEED (which in the .32 case seemed to yield the ~ 
5 queue depth) is roughly equal to what it would have been had we had ~5 
threads doing the concurrent pre-heating in userland, I would expect a 
persistent queue depth of 5 to have a significant effect on live reads - 
particularly on single-disk systems and few-disk raid:s. Unless prioritization 
is going on, a lone live read should on average have to wait for ~ 5 reads to 
complete before it gets it's turn. For a more concurrent live read workload the 
impact should be less in terms of aggregate latency since the extra 5 queue 
depth implies by WILLNEED is less in relative terms.

Actually, a complicating factor here which I just now noticed is that the 
kernel seems to be pulling in a lot more data than intended. The average 
request size claimed by iostat is roughly ~400 k. The normal stress.py read 
test on this data set (-S 3000) seems to generate more like ~50k. Depending on 
whether or not this is retained in page cache without expedited eviction, 
that's potentially bad.



> Migrate cached pages during compaction 
> ---------------------------------------
>
>                 Key: CASSANDRA-1902
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1902
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.1
>            Reporter: T Jake Luciani
>            Assignee: Pavel Yaskevich
>             Fix For: 0.7.5, 0.8
>
>         Attachments: 
> 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt, 
> 1902-BufferedSegmentedFile-logandsleep.txt, 1902-formatted.txt, 
> 1902-per-column-migration-rebase2.txt, 1902-per-column-migration.txt, 
> CASSANDRA-1902-v3.patch, CASSANDRA-1902-v4.patch, CASSANDRA-1902-v5.patch, 
> CASSANDRA-1902-v6.patch
>
>   Original Estimate: 32h
>          Time Spent: 56h
>  Remaining Estimate: 0h
>
> Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a 
> pre-compacted CF during the compaction process.  This is now important since 
> CASSANDRA-1470 caches effectively nothing.  
> For example an active CF being compacted hurts reads since nothing is cached 
> in the new SSTable. 
> The purpose of this ticket then is to make sure SOME data is cached from 
> active CFs. This can be done my monitoring which Old SSTables are in the page 
> cache and caching active rows in the New SStable.
> A simpler yet similar approach is described here: 
> http://insights.oetiker.ch/linux/fadvise/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to