[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction

Benjamin Coverston (JIRA) Tue, 23 Aug 2011 22:00:14 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090005#comment-13090005
 ]


Benjamin Coverston commented on CASSANDRA-1608:
-----------------------------------------------

{quote}
What is the 1.25 supposed to be doing here?
{quote}
dunno what I was thinking, I was screwing around with giving the promoted range 
a size, but it looks like that ended up in the wrong place.
{quote}
Why the "all on the same level" special case? Is this just saying "L0 
compactions must go into L1?"
{quote}
Yes, also when a compaction gets triggered into an empty target level the same 
logic applies.
{quote}
removed this. if L0 is large, it doesn't necessarily follow that L1 is large 
too. I don't see a good reason to second-guess the scoring here.
{quote}
Actually this was there to prevent an OOM exception when too many SSTables were 
participating in any given compaction. You are, however correct that it doesn't 
follow that L1 is large, not in all cases. I'll revise this to give an upper 
bound to the list of L0 candidates in a given compaction.

{quote}
L0 only gets two sstables before it's overcapacity? Are we still allowing L0 
sstables to be large? if so it's not even two
{quote}

I was screwing around with this threshold. One of the side effects of the 
dynamic flush thresholds was that I could end up with a substantial number of 
small SSTables "stuck" in L0. One way to fix this is to always give L0 a small 
positive score when there are any SSTables in L0 so that the SSTables get 
cleared out with the rest of the leveling has been done. Previously I was using 
the memtable flush threshold as the multiplier for L0, but with dynamic 
flushing and global memtable thresholds this doesn't mean much anymore. I'm 
included to leave it and perhaps raise the multiplier for L0 from 2 to 4.


.bq "Exposing number of SSTables in L0 as a JMX property probably isn't a bad 
idea."

I'll get this in

.bq it's not correct for the create/load code to assume that the first data 
directory stays constant across restarts – it should check all directories when 
loading

I'll fix this

{quote}
CFS
===
- not immediately clear to me if the TODOs in isKeyInRemainingSSTables are 
something i should be concerned about
{quote}
I cleaned this up

{quote}
- why do we need the reference mark/unmark now but not before?  is this a bug 
fix independent of 1608?
{quote}

.bq Use reference counting to delete sstables instead of relying on the GC 
patch by slebresne; reviewed by jbellis for CASSANDRA-2521 git-svn-id: 
https://svn.apache.org/repos/asf/cassandra/trunk@1149085 
13f79535-47bb-0310-9956-ffa450edef68

I assumed that as I was doing operations on these SSTables in the referenced 
views I would also need to use these referenced.

{quote}
- are we losing a lot of cycles to markCurrentViewReferenced on the read path 
now that this is 1000s of sstables instead of 10s?
{quote}

Yes this is a potentially serious issue. This code gets called on every read. A 
pretty heavy price to pay during each read.





> Redesigned Compaction
> ---------------------
>
>                 Key: CASSANDRA-1608
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1608
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Benjamin Coverston
>         Attachments: 1608-22082011.txt, 1608-v2.txt, 1608-v4.txt
>
>
> After seeing the I/O issues in CASSANDRA-1470, I've been doing some more 
> thinking on this subject that I wanted to lay out.
> I propose we redo the concept of how compaction works in Cassandra. At the 
> moment, compaction is kicked off based on a write access pattern, not read 
> access pattern. In most cases, you want the opposite. You want to be able to 
> track how well each SSTable is performing in the system. If we were to keep 
> statistics in-memory of each SSTable, prioritize them based on most accessed, 
> and bloom filter hit/miss ratios, we could intelligently group sstables that 
> are being read most often and schedule them for compaction. We could also 
> schedule lower priority maintenance on SSTable's not often accessed.
> I also propose we limit the size of each SSTable to a fix sized, that gives 
> us the ability to  better utilize our bloom filters in a predictable manner. 
> At the moment after a certain size, the bloom filters become less reliable. 
> This would also allow us to group data most accessed. Currently the size of 
> an SSTable can grow to a point where large portions of the data might not 
> actually be accessed as often.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction

Reply via email to