Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-16 Thread Ekaterina Dimitrova
Thanks for opening an epic @Jacek.

It seems the dtest_offheap job is replaced by dtest_latest which means we
will have the same amount of jobs after the current ticket and I am not
worried about Jenkins.

Though in CircleCI we did not have the dtest_offheap job mandatory run
pre-commit but as far as I can see this ticket suggests dtest_latest to be
mandatory run in the pre-commit workflow.
I would like to suggest we commit the current proposal. Only, I think the
config should be mentioned experimental somewhere.

As a short term solution to the raised consumption pre-commit tests run I
would like to suggest we accept running only the J11 pre-commit workflow
(which covers also tests run with J17) until we surface the other
discussion and we apply other test configuration changes/optimizations.

On Fri, 16 Feb 2024 at 9:08, Paulo Motta  wrote:

> Thanks for clarifying Branimir! I'm +1 on proceeding as proposed and I
> think this change will make it easier to gain confidence to update
> configurations.
>
> Interesting discussion and suggestions on this thread - I think we can
> follow-up on improving test/CI workflow in a different thread/proposal to
> avoid blocking this.
>
> On Thu, Feb 15, 2024 at 9:59 AM Branimir Lambov <
> branimir.lam...@datastax.com> wrote:
>
>> Paulo:
>>
>>> 1) Will cassandra.yaml remain the default test config? Is the plan
>>> moving forward to require green CI for both configurations on pre-commit,
>>> or pre-release?
>>
>> The plan is to ensure both configurations are green pre-commit. This
>> should not increase the CI cost as this replaces extra configurations we
>> were running before (e.g. test-tries).
>>
>> 2) What will this mean for the release artifact, is the idea to continue
>>> shipping with the current cassandra.yaml or eventually switch to the
>>> optimized configuration (ie. 6.X) while making the legacy default
>>> configuration available via an optional flag?
>>
>> The release simply includes an additional yaml file, which contains a
>> one-liner how to use it.
>>
>> Jeff:
>>
>>> 1) If there’s an “old compatible default” and “latest recommended
>>> settings”, when does the value in “old compatible default” get updated?
>>> Never?
>>
>> This does not change anything about these decisions. The question is very
>> serious without this patch as well: Does V6 have to support pain-free
>> upgrade from V5 working in V4 compatible mode? If so, can we ever deprecate
>> or drop anything? If not, are we not breaking upgradeability promises?
>>
>> 2) If there are test failures with the new values, it seems REALLY
>>> IMPORTANT to make sure those test failures are discovered + fixed IN THE
>>> FUTURE TOO. If pushing new yaml into a different file makes us less likely
>>> to catch the failures in the future, it seems like we’re hurting ourselves.
>>> Branimir mentions this, but how do we ensure that we don’t let this pattern
>>> disguise future bugs?
>>
>> The main objective of this patch is to ensure that the second yaml is
>> tested too, pre-commit. We were not doing this for all features we tell
>> users are supported.
>>
>> Paulo:
>>
>>> - if cassandra_latest.yaml becomes the new default configuration for
>>> 6.0, then precommit only needs to be run against thatversion - prerelease
>>> needs to be run against all cassandra.yaml variants.
>>
>> Assuming we keep the pace of development, there will be new "latest"
>> features in 6.0 (e.g. Accord could be one). The idea is more to move some
>> of the settings from latest to default when they are deemed mature enough.
>>
>> Josh:
>>
>>> I propose to significantly reduce that stuff. Let's distinguish the
>>> packages of tests that need to be run with CDC enabled / disabled, with
>>> commitlog compression enabled / disabled, tests that verify sstable formats
>>> (mostly io and index I guess), and leave other parameters set as with the
>>> latest configuration - this is the easiest way I think.
>>> For dtests we have vnodes/no-vnodes, offheap/onheap, and nothing about
>>> other stuff. To me running no-vnodes makes no sense because no-vnodes is
>>> just a special case of vnodes=1. On the other hand offheap/onheap buffers
>>> could be tested in unit tests. In short, I'd run dtests only with the
>>> default and latest configuration.
>>
>> Some of these changes are already done in this ticket.
>>
>> Regards,
>> Branimir
>>
>>
>>
>> On Thu, Feb 15, 2024 at 3:08 PM Paulo Motta  wrote:
>>
>>> > It's also been questioned about why we don't just enable settings we
>>> recommend.  These are settings we recommend for new clusters.  *Our
>>> existing cassandra.yaml needs to be tailored for existing clusters being
>>> upgraded, where we are very conservative about changing defaults.*
>>>
>>> I think this unnecessarily penalizes new users with subpar defaults and
>>> existing users who wish to use optimized/recommended defaults and need to
>>> maintain additional logic to support that. This change offers an
>>> opportunity to revisit this.
>>>
>>> 

Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-16 Thread David Capwell
I’m +1 once the tests are passing ands +0 while they are failingSent from my iPhoneOn Feb 16, 2024, at 6:08 AM, Paulo Motta  wrote:Thanks for clarifying Branimir! I'm +1 on proceeding as proposed and I think this change will make it easier to gain confidence to update configurations.Interesting discussion and suggestions on this thread - I think we can follow-up on improving test/CI workflow in a different thread/proposal to avoid blocking this.On Thu, Feb 15, 2024 at 9:59 AM Branimir Lambov  wrote:Paulo:1) Will cassandra.yaml remain the default test config? Is the plan moving forward to require green CI for both configurations on pre-commit, or pre-release?The plan is to ensure both configurations are green pre-commit. This should not increase the CI cost as this replaces extra configurations we were running before (e.g. test-tries).2) What will this mean for the release artifact, is the idea to continue shipping with the current cassandra.yaml or eventually switch to the optimized configuration (ie. 6.X) while making the legacy default configuration available via an optional flag?The release simply includes an additional yaml file, which contains a one-liner how to use it.Jeff:1) If there’s an “old compatible default” and “latest recommended settings”, when does the value in “old compatible default” get updated? Never? This does not change anything about these decisions. The question is very serious without this patch as well: Does V6 have to support pain-free upgrade from V5 working in V4 compatible mode? If so, can we ever deprecate or drop anything? If not, are we not breaking upgradeability promises?2) If there are test failures with the new values, it seems REALLY IMPORTANT to make sure those test failures are discovered + fixed IN THE FUTURE TOO. If pushing new yaml into a different file makes us less likely to catch the failures in the future, it seems like we’re hurting ourselves. Branimir mentions this, but how do we ensure that we don’t let this pattern disguise future bugs? The main objective of this patch is to ensure that the second yaml is tested too, pre-commit. We were not doing this for all features we tell users are supported.Paulo:- if cassandra_latest.yaml becomes the new default configuration for 6.0, then precommit only needs to be run against thatversion - prerelease needs to be run against all cassandra.yaml variants.Assuming we keep the pace of development, there will be new "latest" features in 6.0 (e.g. Accord could be one). The idea is more to move some of the settings from latest to default when they are deemed mature enough.Josh:I propose to significantly reduce that stuff. Let's distinguish the packages of tests that need to be run with CDC enabled / disabled, with commitlog compression enabled / disabled, tests that verify sstable formats (mostly io and index I guess), and leave other parameters set as with the latest configuration - this is the easiest way I think. For dtests we have vnodes/no-vnodes, offheap/onheap, and nothing about other stuff. To me running no-vnodes makes no sense because no-vnodes is just a special case of vnodes=1. On the other hand offheap/onheap buffers could be tested in unit tests. In short, I'd run dtests only with the default and latest configuration.Some of these changes are already done in this ticket. Regards,BranimirOn Thu, Feb 15, 2024 at 3:08 PM Paulo Motta  wrote:> It's also been questioned about why we don't just enable settings we recommend.  These are settings we recommend for new clusters.  *Our existing cassandra.yaml needs to be tailored for existing clusters being upgraded, where we are very conservative about changing defaults.*I think this unnecessarily penalizes new users with subpar defaults and existing users who wish to use optimized/recommended defaults and need to maintain additional logic to support that. This change offers an opportunity to revisit this.Is not updating the default cassandra.yaml with new recommended configuration just to protect existing clusters from accidentally overriding cassandra.yaml with a new version during major upgrades? If so, perhaps we could add a new explicit flag “enable_major_upgrade: false” to “cassandra.yaml” that fails startup if an upgrade is detected and force operators to review the configuration before a major upgrade?Related to Jeff’s question, I think we need a way to consolidate “latest recommended settings” into “old compatible default” when cutting a new major version, otherwise the files will diverge perpetually.I think cassandra_latest.yaml offers a way to “buffer” proposals for default configuration changes which are consolidated into “cassandra.yaml” in the subsequent major release, eventually converging configurations and reducing the maintenance burden.On Thu, 15 Feb 2024 at 04:24 Mick Semb Wever  wrote:      Mick and Ekaterina (and everyone really) - any thoughts on what test coverage, if any, we should

Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-16 Thread Paulo Motta
Thanks for clarifying Branimir! I'm +1 on proceeding as proposed and I
think this change will make it easier to gain confidence to update
configurations.

Interesting discussion and suggestions on this thread - I think we can
follow-up on improving test/CI workflow in a different thread/proposal to
avoid blocking this.

On Thu, Feb 15, 2024 at 9:59 AM Branimir Lambov <
branimir.lam...@datastax.com> wrote:

> Paulo:
>
>> 1) Will cassandra.yaml remain the default test config? Is the plan moving
>> forward to require green CI for both configurations on pre-commit, or
>> pre-release?
>
> The plan is to ensure both configurations are green pre-commit. This
> should not increase the CI cost as this replaces extra configurations we
> were running before (e.g. test-tries).
>
> 2) What will this mean for the release artifact, is the idea to continue
>> shipping with the current cassandra.yaml or eventually switch to the
>> optimized configuration (ie. 6.X) while making the legacy default
>> configuration available via an optional flag?
>
> The release simply includes an additional yaml file, which contains a
> one-liner how to use it.
>
> Jeff:
>
>> 1) If there’s an “old compatible default” and “latest recommended
>> settings”, when does the value in “old compatible default” get updated?
>> Never?
>
> This does not change anything about these decisions. The question is very
> serious without this patch as well: Does V6 have to support pain-free
> upgrade from V5 working in V4 compatible mode? If so, can we ever deprecate
> or drop anything? If not, are we not breaking upgradeability promises?
>
> 2) If there are test failures with the new values, it seems REALLY
>> IMPORTANT to make sure those test failures are discovered + fixed IN THE
>> FUTURE TOO. If pushing new yaml into a different file makes us less likely
>> to catch the failures in the future, it seems like we’re hurting ourselves.
>> Branimir mentions this, but how do we ensure that we don’t let this pattern
>> disguise future bugs?
>
> The main objective of this patch is to ensure that the second yaml is
> tested too, pre-commit. We were not doing this for all features we tell
> users are supported.
>
> Paulo:
>
>> - if cassandra_latest.yaml becomes the new default configuration for 6.0,
>> then precommit only needs to be run against thatversion - prerelease needs
>> to be run against all cassandra.yaml variants.
>
> Assuming we keep the pace of development, there will be new "latest"
> features in 6.0 (e.g. Accord could be one). The idea is more to move some
> of the settings from latest to default when they are deemed mature enough.
>
> Josh:
>
>> I propose to significantly reduce that stuff. Let's distinguish the
>> packages of tests that need to be run with CDC enabled / disabled, with
>> commitlog compression enabled / disabled, tests that verify sstable formats
>> (mostly io and index I guess), and leave other parameters set as with the
>> latest configuration - this is the easiest way I think.
>> For dtests we have vnodes/no-vnodes, offheap/onheap, and nothing about
>> other stuff. To me running no-vnodes makes no sense because no-vnodes is
>> just a special case of vnodes=1. On the other hand offheap/onheap buffers
>> could be tested in unit tests. In short, I'd run dtests only with the
>> default and latest configuration.
>
> Some of these changes are already done in this ticket.
>
> Regards,
> Branimir
>
>
>
> On Thu, Feb 15, 2024 at 3:08 PM Paulo Motta  wrote:
>
>> > It's also been questioned about why we don't just enable settings we
>> recommend.  These are settings we recommend for new clusters.  *Our
>> existing cassandra.yaml needs to be tailored for existing clusters being
>> upgraded, where we are very conservative about changing defaults.*
>>
>> I think this unnecessarily penalizes new users with subpar defaults and
>> existing users who wish to use optimized/recommended defaults and need to
>> maintain additional logic to support that. This change offers an
>> opportunity to revisit this.
>>
>> Is not updating the default cassandra.yaml with new recommended
>> configuration just to protect existing clusters from accidentally
>> overriding cassandra.yaml with a new version during major upgrades? If so,
>> perhaps we could add a new explicit flag “enable_major_upgrade: false” to
>> “cassandra.yaml” that fails startup if an upgrade is detected and force
>> operators to review the configuration before a major upgrade?
>>
>> Related to Jeff’s question, I think we need a way to consolidate “latest
>> recommended settings” into “old compatible default” when cutting a new
>> major version, otherwise the files will diverge perpetually.
>>
>> I think cassandra_latest.yaml offers a way to “buffer” proposals for
>> default configuration changes which are consolidated into “cassandra.yaml”
>> in the subsequent major release, eventually converging configurations and
>> reducing the maintenance burden.
>>
>> On Thu, 15 Feb 2024 at 04:24 Mick Semb We

Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-16 Thread Jacek Lewandowski
We should conclude this discussion by answering Branimir's original
question.* I vote for merging that and exposing issues to the CI.*

For pre-commit optimization I've opened
https://issues.apache.org/jira/browse/CASSANDRA-19406 epic and we should
add exact tasks there to make this valuable discussion result in some
concrete actions. Then, we can discuss each task in a more organized way.

czw., 15 lut 2024 o 21:29 Štefan Miklošovič 
napisał(a):

> I love the idea of David to make this dual config stuff directly the part
> of the tests, I just leave this here where I quickly put some super
> primitive runner together
>
>
> https://github.com/smiklosovic/cassandra/commit/693803772218b52c424491b826c704811d606a31
>
> We could just run by default with one config and annotate it with all
> configs if we think this is crucial to test in both scenarios.
>
> Anyway, happy to expand on this but I do not want to block any progress in
> the ticket, might come afterwards, just showing what is possible.
>
> On Thu, Feb 15, 2024 at 7:59 PM David Capwell  wrote:
>
>> This thread got large quick, yay!
>>
>> is there a reason all guardrails and reliability (aka repair retries)
>> configs are off by default?  They are off by default in the normal config
>> for backwards compatibility reasons, but if we are defining a config saying
>> what we recommend, we should enable these things by default IMO.
>>
>> This is one more question to be answered by this discussion. Are there
>> other options that should be enabled by the "latest" configuration? To what
>> values should they be set?
>> Is there something that is currently enabled that should not be?
>>
>>
>> Very likely, we should try to figure that out.  We should also answer how
>> conservative do we want to be by default?  There are many configs we need
>> to flesh out here, glad to help with the configs I authored (prob best for
>> JIRA rather than this thread)
>>
>>
>> Should we merge the configs breaking these tests?  No…. When we have
>> failing tests people do not spend the time to figure out if their logic
>> caused a regression and merge, making things more unstable… so when we
>> merge failing tests that leads to people merging even more failing tests...
>>
>> In this case this also means that people will not see at all failures
>> that they introduce in any of the advanced features, as they are not tested
>> at all. Also, since CASSANDRA-19167 and 19168 already have fixes, the
>> non-latest test suite will remain clean after merge. Note that these two
>> problems demonstrate that we have failures in the configuration we ship
>> with, because we are not actually testing it at all. IMHO this is a problem
>> that we should not delay fixing.
>>
>>
>> I am not arguing we should not get this into CI, but more we should fix
>> the known issues before getting into CI… its what we normally do, I don’t
>> see a reason to special case this work.
>>
>> I am 100% cool blocking 5.0 on these bugs found (even if they are test
>> failures), but don’t feel we should enable in CI until these issues are
>> resolved; we can add the yaml now, but not the CI pipelines.
>>
>>
>> 1) If there’s an “old compatible default” and “latest recommended
>> settings”, when does the value in “old compatible default” get updated?
>> Never?
>>
>>
>> How about replacing cassandra.yaml with cassandra_latest.yaml on trunk
>> when cutting cassandra-6.0 branch? Any new default changes on trunk go to
>> cassandra_latest.yaml.
>>
>>
>> I feel its dangerous to define this at the file level and should do at
>> the config level… I personally see us adding new features disabled by
>> default in cassandra.yaml and the recommended values in
>> Cassandra-latest.yaml… If I add a config in 5.1.2 should it get enabled by
>> default in 6.0?  I don’t feel thats wise.
>>
>> Maybe it makes sense to annotate the configs with the target version for
>> the default change?
>>
>> Let's distinguish the packages of tests that need to be run with CDC
>> enabled / disabled, with commitlog compression enabled / disabled, tests
>> that verify sstable formats (mostly io and index I guess), and leave other
>> parameters set as with the latest configuration - this is the easiest way I
>> think.
>>
>>
>> Yes please!  I really hate having a pipeline per config, we should
>> annotate this some how in the tests that matter… junit can param the tests
>> for us so we cover the different configs the test supports… I have written
>> many tests that are costly and run on all these other pipelines but have 0
>> change in the config… just wasting resources rerunning…
>>
>> Pushing this to the test also is a better author/maintainer experience…
>> running the test in your IDE and seeing all the params and their results is
>> so much better than monkeying around with yaml files and ant…. My repair
>> simulation tests have a hack flag to try to switch the yaml to make it
>> easier to test against the other configs and I loath it so much…
>>
>> To me