[ 
https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030512#comment-13030512
 ] 

Peter Schuller commented on CASSANDRA-1902:
-------------------------------------------

There's been a lot of discussion spread over some tickets, and it's a complex 
issue that is almost impossible to benchmark without analysis of what is 
actually happening. For example, DONTNEED ala 1470 without migration such as in 
this ticket is expected to be an improvement for truly large datasets where the 
DONTNEED (if it worked, with fsync etc) means that the impact on live traffic 
on long-running compactions as it's running - but you still take the coldness 
hit once compaction completes.

On the other hand, the very same optimization is a total regression in 
performance if you have a data set which is specifically relying on being small 
enough that sstables fit in RAM (with sufficient margin). In this case, prior 
to CASSANDRA-1470 your data would perhaps remain in memory at all times, while 
with 1470 you'd expect to get a much more un-even performance as sudden spikes 
in coldness affect you.

Benchmarking will show whatever you want it to show depending on what 
parameters you choose ;)

I still think that the two primary problems that need to be dealt with are (1) 
the direct impact of background I/O and (2) the indirect effects due to caching 
(basically what I talk about in the original post to CASSANDRA-1882, a ball 
which I totally dropped on the floor - sorry). The complexity mostly arises 
from how other factors, such as data set size, read access pattern, I/O 
scheduling, RAID vs. normal disks, etc, affect the impact of these two 
underlying issues. The relevance of the effects will also be very different 
depending on what the intended behavior and goals are of the cluster; for some 
use-cases, maybe going down to disk is totally fine because there's very little 
live reads and it's just important to keep latency reasonable for the few 
requests that do reach the systems rather than optimize for throughput. For 
other cases, going down to disk is completely unacceptable if the cluster is to 
cope with the live read traffic.

I'm thinking maybe a more over-arching view is in order. To come up with an 
approach which will overall give a well-rounded behavior where the fundamental 
I/O trade-offs are considered, while surrounding behavior like compaction, 
repair, sstable sizing, avoiding transient explosions in live set size etc are 
all also taken into consideration.

As an arbitrary start point, imagine a future version of Cassandra which does 
the following at the same time:

 (1) Support concurrent compaction which is de-coupled from CPU concurrency 
such that there is no need for a maximum concurrency. Small/big jobs could more 
or less seamlessly run concurrently.
 (2) Limited sstable sizes achieving reasonably sized work units for bulk work.
 (3) Incremental "streaming" (not in the current sense) repair where we can do 
one "reasonably sized unit of work" (2) at a time and not stream further data 
until prior work has completed - including compaction to reconcile, to avoid 
sudden jumps in data set size.
 (4) Compaction work spread out over time as in the throttling code that has 
been tested a bit by now (was it Chris that reported it worked well?)
 (5) The changes coming into 0.8 to make repair only repair the range of the 
primary replica, making it easier to reason about and schedule repairs in a 
cluster, and making each repair operation shorter in nature and thus less of an 
administrative problem in terms of timing w.r.t. other cluster operations.
 (6) Compactions, whenever/in whatever speed they do happen, also taking 
advantage of the limited work unit/sstable sizes in additional to taking steps 
(whether it is posix_fadvise/fsync/etc, direct i/o, or whatever) to mitigate or 
eliminate the impact on cache locality on these operations.


> Migrate cached pages during compaction 
> ---------------------------------------
>
>                 Key: CASSANDRA-1902
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1902
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.1
>            Reporter: T Jake Luciani
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: 
> 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt, 
> 1902-BufferedSegmentedFile-logandsleep.txt, 1902-formatted.txt, 
> 1902-per-column-migration-rebase2.txt, 1902-per-column-migration.txt, 
> CASSANDRA-1902-v10-trunk-rebased.patch, CASSANDRA-1902-v3.patch, 
> CASSANDRA-1902-v4.patch, CASSANDRA-1902-v5.patch, CASSANDRA-1902-v6.patch, 
> CASSANDRA-1902-v7.patch, CASSANDRA-1902-v8.patch, 
> CASSANDRA-1902-v9-trunk-rebased.patch, 
> CASSANDRA-1902-v9-trunk-with-jmx.patch, CASSANDRA-1902-v9-trunk.patch, 
> CASSANDRA-1902-v9.patch
>
>   Original Estimate: 32h
>          Time Spent: 56h
>  Remaining Estimate: 0h
>
> Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a 
> pre-compacted CF during the compaction process.  This is now important since 
> CASSANDRA-1470 caches effectively nothing.  
> For example an active CF being compacted hurts reads since nothing is cached 
> in the new SSTable. 
> The purpose of this ticket then is to make sure SOME data is cached from 
> active CFs. This can be done my monitoring which Old SSTables are in the page 
> cache and caching active rows in the New SStable.
> A simpler yet similar approach is described here: 
> http://insights.oetiker.ch/linux/fadvise/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to