[ 
https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030125#comment-13030125
 ] 

Chris Goffinet commented on CASSANDRA-1902:
-------------------------------------------

Peter,

In the modifications I did for 1902, when we open up SSTable files for 
compaction, I run mincore across the file to determine which pages are hot 
already using the hooks in 1902. I use this information in rebuffer() to call 
DONTNEED on pages that were cold to begin with after a read() call. A 
percentage of hints given to pages is being ignored by the kernel. Since that 
page is now 'hot', when we need to mark the hot pages for the new SSTable, we 
migrate more than we need. For example, we did a test where we made sure that 
on memtable flushing, we called DONTNEED on entire file. We verified the 
flushed files were not in cache. Then when compaction kicked in, since all 
pages were cold, we should have new SSTables that are not in cache. What we 
observed was, the final file after a large series of flushes + compaction, 
ended up being 50% in page cache over a long period of time. Even we purposely 
told the OS we don't want the pages in cache (as we read them).

Jake:

So the problem with that approach is that we still need to make sure as we read 
data from disk, if the page is cold, it stays cold. Keeping statistics helps 
the approach of not migrating pages that were cold to hot, but since we still 
have to read the file during compaction we still need to call DONTNEED on pages 
that were cold to begin with. That is what is causing the issue, we know a page 
is cold up front, but the kernel is not respecting that DONTNEED. I thought it 
might be related to READ AHEAD, so I made sure to fadvise FADV_RANDOM, so that 
wasn't the issue either.

Jonathan:

Yeah we run with CASSANDRA-2156, that's helped us a lot for performance 
consistency. We have certain workloads that need to read data recently written, 
so we disabled calling posix_fadvise(fd, 0, 0) during memtable flushes. We 
actually found writing new data, and just letting the kernel manage the pages 
worked better than 1902 solution, because we were calling WILLNEED on pages 
that were never being read to begin with.

One last thing to try would be keeping track of the last (offset, length) so 
after read() call fadvise on the previous pair instead of what I just read. The 
ignoring of hints might be related to the current refcount. I will try this out 
tonight and update the ticket.

> Migrate cached pages during compaction 
> ---------------------------------------
>
>                 Key: CASSANDRA-1902
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1902
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.1
>            Reporter: T Jake Luciani
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: 
> 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt, 
> 1902-BufferedSegmentedFile-logandsleep.txt, 1902-formatted.txt, 
> 1902-per-column-migration-rebase2.txt, 1902-per-column-migration.txt, 
> CASSANDRA-1902-v10-trunk-rebased.patch, CASSANDRA-1902-v3.patch, 
> CASSANDRA-1902-v4.patch, CASSANDRA-1902-v5.patch, CASSANDRA-1902-v6.patch, 
> CASSANDRA-1902-v7.patch, CASSANDRA-1902-v8.patch, 
> CASSANDRA-1902-v9-trunk-rebased.patch, 
> CASSANDRA-1902-v9-trunk-with-jmx.patch, CASSANDRA-1902-v9-trunk.patch, 
> CASSANDRA-1902-v9.patch
>
>   Original Estimate: 32h
>          Time Spent: 56h
>  Remaining Estimate: 0h
>
> Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a 
> pre-compacted CF during the compaction process.  This is now important since 
> CASSANDRA-1470 caches effectively nothing.  
> For example an active CF being compacted hurts reads since nothing is cached 
> in the new SSTable. 
> The purpose of this ticket then is to make sure SOME data is cached from 
> active CFs. This can be done my monitoring which Old SSTables are in the page 
> cache and caching active rows in the New SStable.
> A simpler yet similar approach is described here: 
> http://insights.oetiker.ch/linux/fadvise/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to