[jira] [Commented] (CASSANDRA-16634) Garbagecollect should not output all tables to L0 with LeveledCompactionStrategy

Ekaterina Dimitrova (Jira) Mon, 26 Apr 2021 17:03:06 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332814#comment-17332814
 ]


Ekaterina Dimitrova commented on CASSANDRA-16634:
-------------------------------------------------

Hey [~scottcarey],

Thank you for the patch. I was trying to find you on Slack to say hi and help 
you to clear some misunderstanding from our docs, but I didn't find you in the 
ASF Slack. (You might want to check the #cassandra-dev channel and the dev 
mailing list - https://cassandra.apache.org/community/).  If you drop a 
question in the #cassandra-dev channel there are high chances of getting a fast 
response as we have contributors to the project all over the world and people 
are willing to help when they have a bit of time. 

Both patch and pull request submission work but you are right that pull 
requests are easier to handle and almost everyone use them on the project 
nowadays. We don't open pull requests to the Apache/cassandra repo though. Most 
people just have their fork where they work on things. I would suggest you to 
check this page (if you haven't) 
-https://cassandra.apache.org/doc/latest/development/patches.html?highlight=contributing.
  

Also, we normally try to run some preliminary tests at least. More information 
on how to run tests locally or setup CI can be found here - 
[https://cassandra.apache.org/doc/latest/development/testing.html?highlight=testing]

Unfortunately, at this point the official Jenkins CI is available only for 
committers and PMC members but anyone of us will be happy to submit a CI run 
for you. Circle CI free tier is also an option, just bear in mind that you will 
see Python dtests failing  due to not enough resources available, that is 
expected. 

> Garbagecollect should not output all tables to L0 with 
> LeveledCompactionStrategy
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16634
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16634
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Compaction
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Normal
>             Fix For: 3.11.x, 4.0.x
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> nodetool garbagecollect always outputs to L0 with LeveledCompactionStrategy.
> This is awful.  On a large LCS table, this means that at the end of the 
> garbagecollect process, all data is in L0.
>  
> This results in an awful sequence of useless temporary space usage and write 
> amplification:
>  # L0 is repeatedly size-tiered compacted until it doesn't have too many 
> SSTables.  If the original LCS table had 2000 tables... this takes a long time
>  # L0 is compacted to L1 in one to a couple very very large compactions
>  # L1 is compacted to L2, L3 to L4, etc.  Write amplification galore
> Due to the above, 'nodetool garbagecollect' is close to worthless for large 
> LCS tables.  A full compaction is always less write amplification and similar 
> temp disk space required.  The only exception is if you can use 'nodetool 
> garbagecolect' part-way, and then use 'nodetool stop' to cancel it before L0 
> is too large.  In this case if you are lucky, and the order that it chose to 
> process SSTables coincides with tables that have the most  disk space to 
> clear, you might free up enough disk space to succeed in your original goal.
>  
> However, from what I can tell, there is no good reason to move the output to 
> L0.  Leaving the output table in the same SSTableLevel as the source table 
> does not violate any of the LeveledCompactionStrategy placement rules, as the 
> output by definition has a token range equal to or smaller than the source.
> The only drawback is if the size of the output files is significantly smaller 
> than the source, in which case the source level would be under-sized.   But 
> that seems like a problem that LCS has to handle, not garbagecollect.
> LCS could have a "pull up" operation where it does something like the 
> following.   Assume a table has L4 as the max level, and L3 and L4 are both 
> 'under-sized'.  L3 can attempt to 'pull up' any tables from L4 that do not 
> overlap with the token ranges of the L3 tables.  After that, it can choose to 
> do some compactions that mix L3 and L4 to pull up data into L3 if it is still 
> significantly under-sized.
> From what I can tell, garbagecollect should just re-write tables in place, 
> and leave the compaction strategy to deal with any consequences.
> Moving to L0 is a bad idea.  In addition to the extra write amplification and 
> extreme increase in temporary disk space required, I observed the following:
> A 'nodetool garbagecollect' was placing a lot of pressure on a L0 of a node.  
> We stopped it about 20% through the process, and it managed to compact down 
> the top couple levels.  So we tried to run 'garbagecollect' again, but the 
> first tables it chose to operate on were in L1, not the 'leafs' in L5!   This 
> was because the order of SSTables chosen currently does not consider the 
> level, and instead looks purely at the max timestamp in the  file.  But 
> because we moved _very old_ data from L5 into L0 as a result of the prior 
> gabagecollect, manytables in L1 and L2 now had very wide ranges between their 
> min and max timestamps – essentially some of the oldest and newest data all 
> in one table.    This breaks the usual structure of an LCS table where the 
> oldest data is at the high levels.
>  
> I hope that others agree that this is a bug, and deserving of a fix.
> I have a very simple patch for this that I will be creating a PR for soon.  3 
> lines for the code change, 70 lines for a new unit test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16634) Garbagecollect should not output all tables to L0 with LeveledCompactionStrategy

Reply via email to