[ 
https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012783#comment-13012783
 ] 

Peter Schuller commented on CASSANDRA-1902:
-------------------------------------------

Catching up with ticket history and the latest version of the patch, a few 
things based on the history+patch themselves (I have not tested or benchmarked 
anything):

With respect to avoiding waiting on GC: the munmap() is still in finalize() we 
we're still waiting on GC, right? Just not on every possible ByteBuffer 
(instead only on the MappedFileSegment itself).

BufferedSegmentedFile.tryPreserveFilePageCache() is doing a 
tryPreserveCacheRegion() for every page considered hot. The first thing to be 
aware of then is that this will translate into a posix_fadvise() syscall for 
every page, even when all or almost all pages are in fact in memory. This may 
be acceptable, but keep in mind that use-cases where all or almost all pages 
are in cache, are likely to be the ones CPU-bound rather than disk bound.

The bigger issue with the same thing, is that in the cache of large column 
families that we're trying to optimize for, unless I am missing something the 
preservation process is expected to be entirely seek bound for sparsely hot 
sstables. In the best case for mostly-hot sstables it might not be seek bound 
provided that pre-fetching and/or read-ahead and/or linear access detection is 
working well, but that seems very dependent on system details and the type of 
load the system is under (probably less likely to work well under high "live" 
read i/o loads). In the non-best case (sparsely hot), it should most definitely 
be entirely seek bound.

fadvising entire regions at once instead of once per page might improve that, 
but I still think the better solution is to just not DONTNEED hot data to begin 
with (subject to potential limitations to avoid too frequent DONTNEEDs).

Note: The original motivation for avoiding frequent DONTNEED was performance in 
relation to the syscall. But in this case we're taking a "one syscall per page" 
hit anyway with the WILLNEED:s. In fact in the case of a very hot sstable 
(where CPU efficiency is more important than a cold sstable where disk I/O is 
more important) the WILLNEED:s should be more numerous than the DONTNEED:s 
would have been had they been "fragmented" according to a hotness map.

Disregarding the CPU efficiency concerns though, the primary concern I'd have 
is the WILLNEED calls. Again I haven't tested to make sure I'm not mis-reading 
it, but this should mean that all compactions of actively used sstables will 
end, after the streaming I/O, with lots of seek bound reads to fullfil the 
WILLNEED:s. This can take a lot of time and be expensive in terms of the amount 
of "disk time" being spent (relative to a rate limited compaction process), and 
also violates the otherwise preserved rule that "the only seek-bound I/O is 
live reads; all other I/O is sequential".

Also: If WILLNEED blocks until it's been read, the impact on live traffic 
should be limited but on the other hand latency should be high under read load. 
If WILLNEED doesn't block throughput should have a chance of being reasonable 
by maintaining some queue depth, but on the other hand would potentially 
severely affect live reads. (I don't know which is true, I should check, but I 
haven't yet.)

Minor nit: Seemingly truncated doc string for SegmentedFile.complete().

Minor suggestion: Should isRangeInCache() be renamed to wasRangeInCache() to 
reflect the fact that it does not represent current status? It is not an 
implementation detail because if it did reflect current reality, the caller 
would be incorrect (the test on a per-column basis would constantly give false 
positives as being in cache due to (1) the column just having been serialized, 
which would be easily fixable, but also because (2) previous columns on the 
same page, which is more difficult to fix than moving a line of code).



> Migrate cached pages during compaction 
> ---------------------------------------
>
>                 Key: CASSANDRA-1902
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1902
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.1
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>             Fix For: 0.7.5, 0.8
>
>         Attachments: 
> 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt, 
> 1902-formatted.txt, 1902-per-column-migration-rebase2.txt, 
> 1902-per-column-migration.txt, CASSANDRA-1902-v3.patch, 
> CASSANDRA-1902-v4.patch, CASSANDRA-1902-v5.patch
>
>   Original Estimate: 32h
>          Time Spent: 56h
>  Remaining Estimate: 0h
>
> Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a 
> pre-compacted CF during the compaction process.  This is now important since 
> CASSANDRA-1470 caches effectively nothing.  
> For example an active CF being compacted hurts reads since nothing is cached 
> in the new SSTable. 
> The purpose of this ticket then is to make sure SOME data is cached from 
> active CFs. This can be done my monitoring which Old SSTables are in the page 
> cache and caching active rows in the New SStable.
> A simpler yet similar approach is described here: 
> http://insights.oetiker.ch/linux/fadvise/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to