[jira] [Comment Edited] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210491#comment-17210491 ] Paulo Motta edited comment on CASSANDRA-13701 at 10/8/20, 10:31 PM: I was able to improve runtime of vnode dtests by around 50% on my local machine by [making CCM start nodes in parallel|https://github.com/pauloricardomg/ccm/commit/3b21db1a46b596c2b4850c076e035b5251d7dc39] with a new flag {{-Dcassandra.init.wait_for_live_members}}. [This flag|https://github.com/pauloricardomg/cassandra/commit/d03956b088e0f408ade607c55182619d593c8519] makes the node wait until a specified number of nodes is live *and* part of the ring before proceeding with bootstrap. This ensures the processes are started in parallel but tokens are assigned sequentially. So the first node is started with {{-Dcassandra.init.wait_for_live_members=0}}, the second node with {{-Dcassandra.init.wait_for_live_members=1}}, the third node with {{-Dcassandra.init.wait_for_live_members=2}} and so on. A bit hacky but seems to improve runtimes significantly since we can parallelize a big chunk of the startup time. I'm running this on a very slow machine so we might get nicer improvements on a better CI machines. The good news is that on the non-vnode case the tokens are assigned manually via CCM so we don't need to make nodes start sequentially so the runtimes on the non-vnode case are unchanged. [~e.dimitrova] would you (or someone with CI access) mind re-running the tests above with the branches below to see how the runtimes look with this change? * [cassandra|https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-13701] * [dtest|https://github.com/pauloricardomg/cassandra-dtest/tree/CASSANDRA-13701] * [ccm|https://github.com/pauloricardomg/ccm/tree/CASSANDRA-13701] (cc [~mck] since this is related to CASSANDRA-16079) was (Author: pauloricardomg): I was able to improve runtime of a few vnode dtests by around 50% by [making CCM start nodes in parallel|https://github.com/pauloricardomg/ccm/commit/3b21db1a46b596c2b4850c076e035b5251d7dc39] with a new flag {{-Dcassandra.init.wait_for_live_members}}. [This flag|https://github.com/pauloricardomg/cassandra/commit/d03956b088e0f408ade607c55182619d593c8519] makes the node wait until a specified number of nodes is live *and* part of the ring before proceeding with bootstrap. This ensures the processes are started in parallel but tokens are assigned sequentially. So the first node is started with {{-Dcassandra.init.wait_for_live_members=0}}, the second node with {{-Dcassandra.init.wait_for_live_members=1}}, the third node with {{-Dcassandra.init.wait_for_live_members=2}} and so on. A bit hacky but seems to improve runtimes significantly since we can parallelize a big chunk of the startup time. I'm running this on a very slow machine so we might get nicer improvements on a better CI machines. The good news is that on the non-vnode case the tokens are assigned manually via CCM so we don't need to make nodes start sequentially so the runtimes on the non-vnode case are unchanged. [~e.dimitrova] would you (or someone with CI access) mind re-running the tests above with the branches below to see how the runtimes look with this change? * [cassandra|https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-13701] * [dtest|https://github.com/pauloricardomg/cassandra-dtest/tree/CASSANDRA-13701] * [ccm|https://github.com/pauloricardomg/ccm/tree/CASSANDRA-13701] (cc [~mck] since this is related to CASSANDRA-16079) > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Assignee: Alexander Dejanovski >Priority: Low > Fix For: 4.0-alpha > > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210491#comment-17210491 ] Paulo Motta edited comment on CASSANDRA-13701 at 10/8/20, 10:30 PM: I was able to improve runtime of a few vnode dtests by around 50% by [making CCM start nodes in parallel|https://github.com/pauloricardomg/ccm/commit/3b21db1a46b596c2b4850c076e035b5251d7dc39] with a new flag {{-Dcassandra.init.wait_for_live_members}}. [This flag|https://github.com/pauloricardomg/cassandra/commit/d03956b088e0f408ade607c55182619d593c8519] makes the node wait until a specified number of nodes is live *and* part of the ring before proceeding with bootstrap. This ensures the processes are started in parallel but tokens are assigned sequentially. So the first node is started with {{-Dcassandra.init.wait_for_live_members=0}}, the second node with {{-Dcassandra.init.wait_for_live_members=1}}, the third node with {{-Dcassandra.init.wait_for_live_members=2}} and so on. A bit hacky but seems to improve runtimes significantly since we can parallelize a big chunk of the startup time. I'm running this on a very slow machine so we might get nicer improvements on a better CI machines. The good news is that on the non-vnode case the tokens are assigned manually via CCM so we don't need to make nodes start sequentially so the runtimes on the non-vnode case are unchanged. [~e.dimitrova] would you (or someone with CI access) mind re-running the tests above with the branches below to see how the runtimes look with this change? * [cassandra|https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-13701] * [dtest|https://github.com/pauloricardomg/cassandra-dtest/tree/CASSANDRA-13701] * [ccm|https://github.com/pauloricardomg/ccm/tree/CASSANDRA-13701] (cc [~mck] since this is related to CASSANDRA-16079) was (Author: pauloricardomg): I was able to improve runtime of a few vnode dtests by around 50% by making CCM start nodes in parallel with a new flag {{-Dcassandra.init.wait_for_live_members}}. This flag makes the node wait until a specified number of nodes is live *and* part of the ring before proceeding with bootstrap. This ensures the processes are started in parallel but tokens are assigned sequentially. So the first node is started with {{-Dcassandra.init.wait_for_live_members=0}}, the second node with {{-Dcassandra.init.wait_for_live_members=1}}, the third node with {{-Dcassandra.init.wait_for_live_members=2}} and so on. A bit hacky but seems to improve runtimes significantly since we can parallelize a big chunk of the startup time. I'm running this on a very slow machine so we might get nicer improvements on a better CI machines. The good news is that on the non-vnode case the tokens are assigned manually via CCM so we don't need to make nodes start sequentially so the runtimes on the non-vnode case are unchanged. [~e.dimitrova] would you (or someone with CI access) mind re-running the tests above with the branches below to see how the runtimes look with this change? * [cassandra|https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-13701] * [dtest|https://github.com/pauloricardomg/cassandra-dtest/tree/CASSANDRA-13701] * [ccm|https://github.com/pauloricardomg/ccm/tree/CASSANDRA-13701] (cc [~mck] since this is related to CASSANDRA-16079) > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Assignee: Alexander Dejanovski >Priority: Low > Fix For: 4.0-alpha > > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179534#comment-17179534 ] Michael Semb Wever edited comment on CASSANDRA-13701 at 8/18/20, 10:38 AM: --- CI run [here|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/247/pipeline/258/]. [~adejanovski], could you check if any of [these failures|https://ci-cassandra.apache.org/job/Cassandra-devbranch/247/testReport/] are related? Based on just the one run (though there was no pipelines running in ci-cassandra.a.o at the time), the total runtime for devbranch tests (trunk based) has gone from ~16hrs to ~27hrs. But, due to parallelisation, the pipeline run times have only increased from ~2hrs to ~2:20 hours. This is not ideal but I think it's worth pushing fixing/further-improving dtest performance to out-of-scope and a separate ticket. was (Author: michaelsembwever): CI run [here|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/247/pipeline/258/]. [~adejanovski], could you check if any of [these failures|https://ci-cassandra.apache.org/job/Cassandra-devbranch/247/testReport/] are related? Based on just the one run (though there was no pipelines running in ci-cassandra.a.o at the time), the total runtime for devbranch tests (trunk based) has gone from ~16hrs to ~27hrs. But, due to parallelisation, the pipeline run times have only increased from ~2hrs to ~2:20 hours. This is not ideal but I think it's worth pushing fixing/further-improving dtest performance to out-of-scope and a separate ticket. > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Assignee: Alexander Dejanovski >Priority: Low > Fix For: 4.0-alpha > > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176492#comment-17176492 ] Brandon Williams edited comment on CASSANDRA-13701 at 8/12/20, 5:10 PM: bq. then they can both perform the check while none of them is gossiping yet Just fyi, this is a known limitation, collision checks are best effort since as you've pointed out, there will always be a small window where you can manage to avoid it. was (Author: brandon.williams): > then they can both perform the check while none of them is gossiping yet Just fyi, this is a known limitation, collision checks are best effort since as you've pointed out, there will always be a small window where you can manage to avoid it. > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Assignee: Alexander Dejanovski >Priority: Low > Fix For: 4.0-alpha > > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171556#comment-17171556 ] Jeremy Hanna edited comment on CASSANDRA-13701 at 8/5/20, 3:29 PM: --- I started down this road but don't think I can get through fixing all of the dtests. On this line in bootstrap-test.py I changed the time.sleep to 10 seconds and it appears to solve that problem - https://github.com/apache/cassandra-dtest/blob/master/bootstrap_test.py#L485 However there were many tests with replace_address that I'm not sure about. I don't know how or why replace address would be affected by the new token allocation algorithm. Dimitar said something about parallel bootstrap but I don't see that - sometimes no_wait or wait_other_notice is true or false so I thought it was that, but perhaps someone more familiar with ccm could see. I'm sorry - I really want this to get in for the release but I don't have the time to dedicate to learning dtest at a deeper level to fix all of these in time. was (Author: jeromatron): I started down this road but don't think I can get through fixing all of the dtests. On this line in bootstrap-test.py I changed the time.sleep to 10 seconds and it appears to solve that problem - https://github.com/apache/cassandra-dtest/blob/master/bootstrap_test.py#L485 However there were many tests with replace_address that I'm not sure about. I don't know how or why replace address would be affected by the new token allocation algorithm. Dmitri said something about parallel bootstrap but I don't see that - sometimes no_wait or wait_other_notice is true or false so I thought it was that, but perhaps someone more familiar with ccm could see. I'm sorry - I really want this to get in for the release but I don't have the time to dedicate to learning dtest at a deeper level to fix all of these in time. > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Priority: Low > Fix For: 4.0-alpha > > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153368#comment-17153368 ] Michael Semb Wever edited comment on CASSANDRA-13701 at 7/8/20, 8:58 AM: - Cassandra-builds patch: https://github.com/apache/cassandra-builds/compare/master...thelastpickle:mck/13701--num_tokens_16 New dtest CI run: https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-dtest/213 was (Author: michaelsembwever): Cassandra-builds patch: https://github.com/apache/cassandra-builds/compare/master...thelastpickle:mck/13701--num_tokens_16 New dtest CI run: https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-dtest/212/ > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Assignee: Jeremy Hanna >Priority: Low > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152460#comment-17152460 ] Jeremy Hanna edited comment on CASSANDRA-13701 at 7/7/20, 3:28 AM: --- Can we also standardize the tests to use the default values - that is, from 32 to the new defaults (16 {{num_tokens}} with {{allocate_tokens_for_local_replication_factor=3}} uncommented). was (Author: jeromatron): Can we also standardize the tests to use the default values - that is, from 32 to the new defaults (16 {{num_tokens}} with {{allocate_tokens_for_local_replication_factor=3}} uncommented. > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Assignee: Jeremy Hanna >Priority: Low > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551241#comment-16551241 ] Sumanth Pasupuleti edited comment on CASSANDRA-13701 at 7/20/18 8:37 PM: - [CASSANDRA-14557 |https://issues.apache.org/jira/browse/CASSANDRA-14557] adds default_keyspace_rf to yaml that could possibly serve the purpose [~KurtG] is referring to, in the above comment. was (Author: sumanth.pasupuleti): [CASSANDRA-14557 |https://issues.apache.org/jira/browse/CASSANDRA-14557] adds default_keyspace_rf that could possibly serve the purpose [~KurtG] is referring to, in the above comment. > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement >Reporter: Chris Lohfink >Priority: Minor > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095638#comment-16095638 ] Jeremy Hanna edited comment on CASSANDRA-13701 at 7/21/17 1:35 AM: --- Adding datacenters with the new algorithm requires some additional configuration. We would need to make users aware of that trade-off when using that algorithm and the benefits of fewer token ranges per node. It's talked about [here|http://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/config/configVnodes.html] but we should make it clearer in the apache docs as well. We can point to those in the comments around vnode tokens. So it would be nice to add some more information [here|http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#allocate-tokens-for-keyspace] and then perhaps in [here|http://cassandra.apache.org/doc/latest/operating/topo_changes.html] with an additional section about adding a datacenter. And Jeff: good point about the token allocation - would be good to track that down before making the new algorithm the default. However even still I think even with the old algorithm we could at the very least halve the number of default vnode ranges. was (Author: jeromatron): Adding datacenters with the new algorithm requires some additional configuration. We would need to make users aware of that trade-off when using that algorithm and the benefits of fewer token ranges per node. It's talked about [here|http://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/config/configVnodes.html] but we should make it clearer in the apache docs as well. We can point to those in the comments around vnode tokens. So it would be nice to add some more information [here|http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#allocate-tokens-for-keyspace] and then perhaps in [here|http://cassandra.apache.org/doc/latest/operating/topo_changes.html] with an additional section about adding a datacenter. > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095638#comment-16095638 ] Jeremy Hanna edited comment on CASSANDRA-13701 at 7/21/17 1:31 AM: --- Adding datacenters with the new algorithm requires some additional configuration. We would need to make users aware of that trade-off when using that algorithm and the benefits of fewer token ranges per node. It's talked about [here|http://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/config/configVnodes.html] but we should make it clearer in the apache docs as well. We can point to those in the comments around vnode tokens. So it would be nice to add some more information [here|http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#allocate-tokens-for-keyspace] and then perhaps in [here|http://cassandra.apache.org/doc/latest/operating/topo_changes.html] with an additional section about adding a datacenter. was (Author: jeromatron): Adding datacenters with the new algorithm requires some additional configuration. We would need to make users aware of that trade-off when using that algorithm and the benefits of fewer token ranges per node. It's talked about [here|http://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/config/configVnodes.html] but we should make it clearer in the apache docs as well. We can point to those in the comments around vnode tokens. > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org