[ https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012783#comment-13012783 ]
Peter Schuller commented on CASSANDRA-1902: ------------------------------------------- Catching up with ticket history and the latest version of the patch, a few things based on the history+patch themselves (I have not tested or benchmarked anything): With respect to avoiding waiting on GC: the munmap() is still in finalize() we we're still waiting on GC, right? Just not on every possible ByteBuffer (instead only on the MappedFileSegment itself). BufferedSegmentedFile.tryPreserveFilePageCache() is doing a tryPreserveCacheRegion() for every page considered hot. The first thing to be aware of then is that this will translate into a posix_fadvise() syscall for every page, even when all or almost all pages are in fact in memory. This may be acceptable, but keep in mind that use-cases where all or almost all pages are in cache, are likely to be the ones CPU-bound rather than disk bound. The bigger issue with the same thing, is that in the cache of large column families that we're trying to optimize for, unless I am missing something the preservation process is expected to be entirely seek bound for sparsely hot sstables. In the best case for mostly-hot sstables it might not be seek bound provided that pre-fetching and/or read-ahead and/or linear access detection is working well, but that seems very dependent on system details and the type of load the system is under (probably less likely to work well under high "live" read i/o loads). In the non-best case (sparsely hot), it should most definitely be entirely seek bound. fadvising entire regions at once instead of once per page might improve that, but I still think the better solution is to just not DONTNEED hot data to begin with (subject to potential limitations to avoid too frequent DONTNEEDs). Note: The original motivation for avoiding frequent DONTNEED was performance in relation to the syscall. But in this case we're taking a "one syscall per page" hit anyway with the WILLNEED:s. In fact in the case of a very hot sstable (where CPU efficiency is more important than a cold sstable where disk I/O is more important) the WILLNEED:s should be more numerous than the DONTNEED:s would have been had they been "fragmented" according to a hotness map. Disregarding the CPU efficiency concerns though, the primary concern I'd have is the WILLNEED calls. Again I haven't tested to make sure I'm not mis-reading it, but this should mean that all compactions of actively used sstables will end, after the streaming I/O, with lots of seek bound reads to fullfil the WILLNEED:s. This can take a lot of time and be expensive in terms of the amount of "disk time" being spent (relative to a rate limited compaction process), and also violates the otherwise preserved rule that "the only seek-bound I/O is live reads; all other I/O is sequential". Also: If WILLNEED blocks until it's been read, the impact on live traffic should be limited but on the other hand latency should be high under read load. If WILLNEED doesn't block throughput should have a chance of being reasonable by maintaining some queue depth, but on the other hand would potentially severely affect live reads. (I don't know which is true, I should check, but I haven't yet.) Minor nit: Seemingly truncated doc string for SegmentedFile.complete(). Minor suggestion: Should isRangeInCache() be renamed to wasRangeInCache() to reflect the fact that it does not represent current status? It is not an implementation detail because if it did reflect current reality, the caller would be incorrect (the test on a per-column basis would constantly give false positives as being in cache due to (1) the column just having been serialized, which would be easily fixable, but also because (2) previous columns on the same page, which is more difficult to fix than moving a line of code). > Migrate cached pages during compaction > --------------------------------------- > > Key: CASSANDRA-1902 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1902 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 0.7.1 > Reporter: T Jake Luciani > Assignee: T Jake Luciani > Fix For: 0.7.5, 0.8 > > Attachments: > 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt, > 1902-formatted.txt, 1902-per-column-migration-rebase2.txt, > 1902-per-column-migration.txt, CASSANDRA-1902-v3.patch, > CASSANDRA-1902-v4.patch, CASSANDRA-1902-v5.patch > > Original Estimate: 32h > Time Spent: 56h > Remaining Estimate: 0h > > Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a > pre-compacted CF during the compaction process. This is now important since > CASSANDRA-1470 caches effectively nothing. > For example an active CF being compacted hurts reads since nothing is cached > in the new SSTable. > The purpose of this ticket then is to make sure SOME data is cached from > active CFs. This can be done my monitoring which Old SSTables are in the page > cache and caching active rows in the New SStable. > A simpler yet similar approach is described here: > http://insights.oetiker.ch/linux/fadvise/ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira