[ 
https://issues.apache.org/jira/browse/CASSANDRA-16315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-16315:
-------------------------------------
    Description: 
Since CASSANDRA-7551, we gave the following advice for setting 
{{concurrent_compactors}}:

{code}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
{code}

However in practice there are a number of problems with this.  While it's true 
that one can increase {{concurrent_compactors}} to improve efficiency of 
compactions on machines with more cpu cores, the context switching with random 
IO and GC associated with bringing compaction data into the heap will work 
against the additional parallelism.

This has caused problems for those who have taken this advice literally.

I propose that we adjust this language to give a limit on number of 
{{concurrent_compactors}} for this setting both in the 3.x line and in trunk so 
that new users do not stumble when reviewing whether to change defaults.

See also CASSANDRA-7139 for a discussion on considerations.

I see two short-term options to avoid new user pain:

1. Change the language to say something like this:

{quote}
When using SSD based storage, you can increase the number of 
{{concurrent_compactors}}.  However be aware that using too many concurrent 
compactors can have a detrimental effect such as GC pressure, more context 
switching among compactors and realtime operations, and more random IO pulling 
data for different compactions.  It's best to test and measure with your 
workload and hardware.
{quote}

2. Do some significant testing of compaction efficient and read/write 
latency/throughput targets to see where the tipping point is - considering some 
constants around memory and heap size and configuration to keep it simple.

  was:
Since CASSANDRA-7551, we gave the following advice for setting 
concurrent_compactors:

{code}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
{code}

However in practice there are a number of problems with this.  While it's true 
that one can increase {{concurrent_compactors}} to improve efficiency of 
compactions on machines with more cpu cores, the context switching with random 
IO and GC associated with bringing compaction data into the heap will work 
against the additional parallelism.

This has caused problems for those who have taken this advice literally.

I propose that we adjust this language to give a limit on number of 
{{concurrent_compactors}} for this setting both in the 3.x line and in trunk so 
that new users do not stumble when reviewing whether to change defaults.

See also CASSANDRA-7139 for a discussion on considerations.

I see two short-term options to avoid new user pain:

1. Change the language to say something like this:

{quote}
When using SSD based storage, you can increase the number of 
{{concurrent_compactors}}.  However be aware that using too many concurrent 
compactors can have a detrimental effect such as GC pressure, more context 
switching among compactors and realtime operations, and more random IO pulling 
data for different compactions.  It's best to test and measure with your 
workload and hardware.
{quote}

2. Do some significant testing of compaction efficient and read/write 
latency/throughput targets to see where the tipping point is - considering some 
constants around memory and heap size and configuration to keep it simple.


> Remove bad advice on concurrent compactors from cassandra.yaml
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-16315
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16315
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Config
>            Reporter: Jeremy Hanna
>            Priority: Normal
>
> Since CASSANDRA-7551, we gave the following advice for setting 
> {{concurrent_compactors}}:
> {code}
> # If your data directories are backed by SSD, you should increase this
> # to the number of cores.
> {code}
> However in practice there are a number of problems with this.  While it's 
> true that one can increase {{concurrent_compactors}} to improve efficiency of 
> compactions on machines with more cpu cores, the context switching with 
> random IO and GC associated with bringing compaction data into the heap will 
> work against the additional parallelism.
> This has caused problems for those who have taken this advice literally.
> I propose that we adjust this language to give a limit on number of 
> {{concurrent_compactors}} for this setting both in the 3.x line and in trunk 
> so that new users do not stumble when reviewing whether to change defaults.
> See also CASSANDRA-7139 for a discussion on considerations.
> I see two short-term options to avoid new user pain:
> 1. Change the language to say something like this:
> {quote}
> When using SSD based storage, you can increase the number of 
> {{concurrent_compactors}}.  However be aware that using too many concurrent 
> compactors can have a detrimental effect such as GC pressure, more context 
> switching among compactors and realtime operations, and more random IO 
> pulling data for different compactions.  It's best to test and measure with 
> your workload and hardware.
> {quote}
> 2. Do some significant testing of compaction efficient and read/write 
> latency/throughput targets to see where the tipping point is - considering 
> some constants around memory and heap size and configuration to keep it 
> simple.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to