[ 
https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378235#comment-16378235
 ] 

Oleksandr Shulgin edited comment on CASSANDRA-14210 at 2/27/18 8:35 AM:
------------------------------------------------------------------------

We are observing a very similar problem with ordinary compaction.  Not sure if 
the proposed change could cover both (with the difference that in compaction 
you likely want to start with the smallest tables first, but this is up to the 
actual compaction strategy).

A node runs with {{concurrent_compactors=2}} and is doing a rather big 
compaction (> 200 GB) on a table.  At the same time, a lot of small files are 
streamed in by repair, for a different table.  Number of {{*-Data.db}} files 
for that other table grows as high as 5,500 and estimated number of pending 
compaction tasks for this node jumps to over 180.  But no compaction is started 
for the table with a lot of small data files, up until the only current 
compaction task finishes.  Why is that so?  I would expect that a free 
compaction slot is utilized immediately for new tasks.



was (Author: oshulgin):
We are observing a very similar problem with ordinary compaction.  Not sure if 
the proposed change could cover both (with the difference that in compaction 
you likely want to start with the smallest tables first, but this is up to the 
actual compaction strategy).

A node runs with {{concurrent_compactors=2}} and is doing a rather big 
compaction (> 200 GB) on a table.  At the same time, a lot of small files are 
streamed in by repair, for a different table.  Number of {{*-Data.db}} for that 
other table grows as high as 5,500 and estimated number of pending compaction 
tasks for this node jumps to over 180.  But no compaction is started for the 
table with a lot of small data files, up until the only current compaction task 
finishes.  Why is that so?  I would expect that a free compaction slot is 
utilized immediately for new tasks.


> Optimize SSTables upgrade task scheduling
> -----------------------------------------
>
>                 Key: CASSANDRA-14210
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Oleksandr Shulgin
>            Assignee: Kurt Greaves
>            Priority: Major
>             Fix For: 4.x
>
>
> When starting the SSTable-rewrite process by running {{nodetool 
> upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are 
> used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  
> What we observed both for version 2.2 and 3.0, is that initially all 4 
> provided slots are used for "Upgrade sstables" compactions, but later when 
> some of the 4 tasks are finished, no new tasks are scheduled immediately.  It 
> takes the last of the 4 tasks to finish before new 4 tasks would be 
> scheduled.  This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the 
> --jobs N parameter.  In the field, on a cluster of 12 nodes with 4-5 TiB data 
> each, we've seen that the whole process was taking more than 7 days, instead 
> of estimated 1.5-2 days (provided there would be close to full N slots 
> utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction 
> slot.
> Additionally, starting from the biggest SSTables could further reduce the 
> total time required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to