[ https://issues.apache.org/jira/browse/CASSANDRA-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985774#action_12985774 ]
Jonathan Ellis commented on CASSANDRA-1882: ------------------------------------------- How is this looking, Peter? > rate limit all background I/O > ----------------------------- > > Key: CASSANDRA-1882 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1882 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Peter Schuller > Assignee: Peter Schuller > Priority: Minor > Fix For: 0.7.1 > > > There is a clear need to support rate limiting of all background I/O (e.g., > compaction, repair). In some cases background I/O is naturally rate limited > as a result of being CPU bottlenecked, but in all cases where the CPU is not > the bottleneck, background streaming I/O is almost guaranteed (barring a very > very smart RAID controller or I/O subsystem that happens to cater extremely > well to the use case) to be detrimental to the latency and throughput of > regular live traffic (reads). > Ways in which live traffic is negatively affected by backgrounds I/O includes: > * Indirectly by page cache eviction (see e.g. CASSANDRA-1470). > * Reads are directly detrimental when not otherwise limited for the usual > reasons; large continuing read requests that keep coming are battling with > latency sensitive live traffic (mostly seek bound). Mixing seek-bound latency > critical with bulk streaming is a classic no-no for I/O scheduling. > * Writes are directly detrimental in a similar fashion. > * But in particular, writes are more difficult still: Caching effects tend to > augment the effects because lacking any kind of fsync() or direct I/O, the > operating system and/or RAID controller tends to defer writes when possible. > This often leads to a very sudden throttling of the application when caches > are filled, at which point there is potentially a huge backlog of data to > write. > ** This may evict a lot of data from page cache since dirty buffers cannot be > evicted prior to being flushed out (though CASSANDRA-1470 and related will > hopefully help here). > ** In particular, one major reason why batter-backed RAID controllers are > great is that they have the capability to "eat" storms of writes very quickly > and schedule them pretty efficiently with respect to a concurrent continuous > stream of reads. But this ability is defeated if we just throw data at it > until entirely full. Instead a rate-limited approach means that data can be > thrown at said RAID controller at a reasonable pace and it can be allowed to > do its job of limiting the impact of those writes on reads. > I propose a mechanism whereby all such backgrounds reads are rate limited in > terms of MB/sec throughput. There would be: > * A configuration option to state the target rate (probably a global, until > there is support for per-cf sstable placement) > * A configuration option to state the sampling granularity. The granularity > would have to be small enough for rate limiting to be effective (i.e., the > amount of I/O generated in between each sample must be reasonably small) > while large enough to not be expensive (neither in terms of gettimeofday() > type over-head, nor in terms of causing smaller writes so that would-be > streaming operations become seek bound). There would likely be a recommended > value on the order of say 5 MB, with a recommendation to multiply that with > the number of disks in the underlying device (5 MB assumes classic mechanical > disks). > Because of coarse granularity (= infrequent synchronization), there should > not be a significant overhead associated with maintaining shared global rate > limiter for the Cassandra instance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.