[ https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030512#comment-13030512 ]
Peter Schuller commented on CASSANDRA-1902: ------------------------------------------- There's been a lot of discussion spread over some tickets, and it's a complex issue that is almost impossible to benchmark without analysis of what is actually happening. For example, DONTNEED ala 1470 without migration such as in this ticket is expected to be an improvement for truly large datasets where the DONTNEED (if it worked, with fsync etc) means that the impact on live traffic on long-running compactions as it's running - but you still take the coldness hit once compaction completes. On the other hand, the very same optimization is a total regression in performance if you have a data set which is specifically relying on being small enough that sstables fit in RAM (with sufficient margin). In this case, prior to CASSANDRA-1470 your data would perhaps remain in memory at all times, while with 1470 you'd expect to get a much more un-even performance as sudden spikes in coldness affect you. Benchmarking will show whatever you want it to show depending on what parameters you choose ;) I still think that the two primary problems that need to be dealt with are (1) the direct impact of background I/O and (2) the indirect effects due to caching (basically what I talk about in the original post to CASSANDRA-1882, a ball which I totally dropped on the floor - sorry). The complexity mostly arises from how other factors, such as data set size, read access pattern, I/O scheduling, RAID vs. normal disks, etc, affect the impact of these two underlying issues. The relevance of the effects will also be very different depending on what the intended behavior and goals are of the cluster; for some use-cases, maybe going down to disk is totally fine because there's very little live reads and it's just important to keep latency reasonable for the few requests that do reach the systems rather than optimize for throughput. For other cases, going down to disk is completely unacceptable if the cluster is to cope with the live read traffic. I'm thinking maybe a more over-arching view is in order. To come up with an approach which will overall give a well-rounded behavior where the fundamental I/O trade-offs are considered, while surrounding behavior like compaction, repair, sstable sizing, avoiding transient explosions in live set size etc are all also taken into consideration. As an arbitrary start point, imagine a future version of Cassandra which does the following at the same time: (1) Support concurrent compaction which is de-coupled from CPU concurrency such that there is no need for a maximum concurrency. Small/big jobs could more or less seamlessly run concurrently. (2) Limited sstable sizes achieving reasonably sized work units for bulk work. (3) Incremental "streaming" (not in the current sense) repair where we can do one "reasonably sized unit of work" (2) at a time and not stream further data until prior work has completed - including compaction to reconcile, to avoid sudden jumps in data set size. (4) Compaction work spread out over time as in the throttling code that has been tested a bit by now (was it Chris that reported it worked well?) (5) The changes coming into 0.8 to make repair only repair the range of the primary replica, making it easier to reason about and schedule repairs in a cluster, and making each repair operation shorter in nature and thus less of an administrative problem in terms of timing w.r.t. other cluster operations. (6) Compactions, whenever/in whatever speed they do happen, also taking advantage of the limited work unit/sstable sizes in additional to taking steps (whether it is posix_fadvise/fsync/etc, direct i/o, or whatever) to mitigate or eliminate the impact on cache locality on these operations. > Migrate cached pages during compaction > --------------------------------------- > > Key: CASSANDRA-1902 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1902 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 0.7.1 > Reporter: T Jake Luciani > Assignee: Pavel Yaskevich > Fix For: 1.0 > > Attachments: > 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt, > 1902-BufferedSegmentedFile-logandsleep.txt, 1902-formatted.txt, > 1902-per-column-migration-rebase2.txt, 1902-per-column-migration.txt, > CASSANDRA-1902-v10-trunk-rebased.patch, CASSANDRA-1902-v3.patch, > CASSANDRA-1902-v4.patch, CASSANDRA-1902-v5.patch, CASSANDRA-1902-v6.patch, > CASSANDRA-1902-v7.patch, CASSANDRA-1902-v8.patch, > CASSANDRA-1902-v9-trunk-rebased.patch, > CASSANDRA-1902-v9-trunk-with-jmx.patch, CASSANDRA-1902-v9-trunk.patch, > CASSANDRA-1902-v9.patch > > Original Estimate: 32h > Time Spent: 56h > Remaining Estimate: 0h > > Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a > pre-compacted CF during the compaction process. This is now important since > CASSANDRA-1470 caches effectively nothing. > For example an active CF being compacted hurts reads since nothing is cached > in the new SSTable. > The purpose of this ticket then is to make sure SOME data is cached from > active CFs. This can be done my monitoring which Old SSTables are in the page > cache and caching active rows in the New SStable. > A simpler yet similar approach is described here: > http://insights.oetiker.ch/linux/fadvise/ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira