[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749221#comment-16749221 ] Ariel Weisberg commented on CASSANDRA-13508: I once again forgot that the key already includes the table. My bad! Making storage of the existing state more efficient is totally reasonable. That means the rest of this is just a tangent, if you are only interested in this ticket you can skip it. bq. Keyspace level lwts would be cool, but I think they've always been more of a pony than a feature anyone was seriously considering implementing. Keyspace wouldn't make contention worse at the CAS level because you would still be CASing different tables so the coordinator can pack together multiple CAS operations for different tables into a single Paxos round. Multi-paxos or preferring a single coordinator means you won't have a problem with dueling proposers. Mostly people don't want CAS anyways. They have some other read/write logic they are trying to accomplish and that can frequently be expressed as a pure function with parameters. Once people are expressing these as pure functions contention isn't an issue because you can pack several of them into a single paxos round and then once the round is accepted commit them all back to back without running another round. This approach gets throughput up to point where the bottleneck is reading and writing to execute the transactions not agreeing on what those transactions are. The level beyond this is to try and execute transactions concurrently and in parallel, pessimistic locking, optimistic tracking of conflicts. Once you are able to run these things you also don't need pure functions and can support general transactions and just agree on commit order. Pure functions are just a quick way to increase expressiveness without having to go whole hog on being a transactional database. It's what a lot of successful systems (e.g. Spanner, DynamoDB) do. Cassandra is the odd one out even in that you are restricted to one tables worth of schema for any kind of strongly consistent operation. I imagine there are other people with similar ideas about how to support stronger consistency and more expressive transactions in a practical way. > Make system.paxos table compaction strategy configurable > > > Key: CASSANDRA-13508 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13508 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Major > Labels: LWT, core, paxos > Fix For: 4.x > > Attachments: test11.png, test2.png > > > The default compaction strategy for {{system.paxos}} table is LCS for > performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the > system is busy with {{system.paxos}} compaction. > As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our > test, it significantly reduced the number of compaction without impacting the > latency too much: > !test11.png! > The time window for TWCS is set to 2 minutes for the test. > Here is the p99 latency impact: > !test2.png! > the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% > increase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746698#comment-16746698 ] Blake Eggleston commented on CASSANDRA-13508: - Keyspace level lwts would be cool, but I think they've always been more of a pony than a feature anyone was seriously considering implementing. Implicitly expanding paxos granularity from table/primary_key into keyspace/primary_key isn't a good idea imo, since you can introduce cross table contention that users don't necessarily need or want. Also, assuming keyspace lwts does become a thing, splitting system.paxos up per table wouldn't necessarily preclude them. > Make system.paxos table compaction strategy configurable > > > Key: CASSANDRA-13508 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13508 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Major > Labels: LWT, core, paxos > Fix For: 4.x > > Attachments: test11.png, test2.png > > > The default compaction strategy for {{system.paxos}} table is LCS for > performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the > system is busy with {{system.paxos}} compaction. > As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our > test, it significantly reduced the number of compaction without impacting the > latency too much: > !test11.png! > The time window for TWCS is set to 2 minutes for the test. > Here is the p99 latency impact: > !test2.png! > the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% > increase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746655#comment-16746655 ] Ariel Weisberg commented on CASSANDRA-13508: I don't think a paxos table per table is the right way to go about it. The key for the paxos table should just be the partition key. If multiple tables share the same partition key then you can do transactions across them. We want to head in the direction of more expressive not less. > Make system.paxos table compaction strategy configurable > > > Key: CASSANDRA-13508 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13508 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Major > Labels: LWT, core, paxos > Fix For: 4.x > > Attachments: test11.png, test2.png > > > The default compaction strategy for {{system.paxos}} table is LCS for > performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the > system is busy with {{system.paxos}} compaction. > As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our > test, it significantly reduced the number of compaction without impacting the > latency too much: > !test11.png! > The time window for TWCS is set to 2 minutes for the test. > Here is the p99 latency impact: > !test2.png! > the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% > increase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215827#comment-16215827 ] Jeff Jirsa commented on CASSANDRA-13508: That's an interesting idea. In the linked CASSANDRA-13548 , I noted that the current partition key excludes the CFID, which causes write amplification when system.paxos is LCS. With a paxos-table-per-table, that problem would be eliminated. > Make system.paxos table compaction strategy configurable > > > Key: CASSANDRA-13508 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13508 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Jay Zhuang >Assignee: Jay Zhuang > Labels: core, paxos > Fix For: 4.0, 4.x > > Attachments: test11.png, test2.png > > > The default compaction strategy for {{system.paxos}} table is LCS for > performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the > system is busy with {{system.paxos}} compaction. > As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our > test, it significantly reduced the number of compaction without impacting the > latency too much: > !test11.png! > The time window for TWCS is set to 2 minutes for the test. > Here is the p99 latency impact: > !test2.png! > the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% > increase. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215821#comment-16215821 ] Blake Eggleston commented on CASSANDRA-13508: - Coming back to this, I think making {{system.paxos}} configurable might be the wrong approach. The system keyspaces typically store system metadata, where the data and query volumes are low enough that the table configuration shouldn’t be a concern for operators. The paxos table (and the batch log, but let's stick to the paxos table for now) is directly involved in the processing of queries. In applications that are heavy cas users, it can be the most heavily used table in the cluster. Maybe it would be better to treat these tables more like index tables, where each table involved in paxos operations gets it’s own local sidecar table? We could either then make the paxos table inherit the compaction settings from it’s parent table, or we could enable separately tuning the paxos table with a {{paxos_compaction}} schema option or something. Any thoughts on this? > Make system.paxos table compaction strategy configurable > > > Key: CASSANDRA-13508 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13508 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Jay Zhuang >Assignee: Jay Zhuang > Labels: core, paxos > Fix For: 4.0, 4.x > > Attachments: test11.png, test2.png > > > The default compaction strategy for {{system.paxos}} table is LCS for > performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the > system is busy with {{system.paxos}} compaction. > As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our > test, it significantly reduced the number of compaction without impacting the > latency too much: > !test11.png! > The time window for TWCS is set to 2 minutes for the test. > Here is the p99 latency impact: > !test2.png! > the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% > increase. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030443#comment-16030443 ] Jay Zhuang commented on CASSANDRA-13508: [~bdeggleston] You're right, the configuration is lost after restart. To make it configurable in {{cassandra.yaml}}, what do you think about the options like this: {noformat} system_paxos_compaction_strategy: - class_name: LeveledCompactionStrategy parameters: - sstable_size_in_mb: "160MB" tombstone_threshold: "0.2" {noformat} > Make system.paxos table compaction strategy configurable > > > Key: CASSANDRA-13508 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13508 > Project: Cassandra > Issue Type: Improvement >Reporter: Jay Zhuang >Assignee: Jay Zhuang > Labels: core, paxos > Fix For: 4.0, 4.x > > Attachments: test11.png, test2.png > > > The default compaction strategy for {{system.paxos}} table is LCS for > performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the > system is busy with {{system.paxos}} compaction. > As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our > test, it significantly reduced the number of compaction without impacting the > latency too much: > !test11.png! > The time window for TWCS is set to 2 minutes for the test. > Here is the p99 latency impact: > !test2.png! > the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% > increase. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027912#comment-16027912 ] Jay Zhuang commented on CASSANDRA-13508: {quote} [~kohlisankalp]: Your benchmark is on which version of C*? {quote} The test is done on version 3.0.11. > Make system.paxos table compaction strategy configurable > > > Key: CASSANDRA-13508 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13508 > Project: Cassandra > Issue Type: Improvement >Reporter: Jay Zhuang >Assignee: Jay Zhuang > Fix For: 4.0, 4.x > > Attachments: test11.png, test2.png > > > The default compaction strategy for {{system.paxos}} table is LCS for > performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the > system is busy with {{system.paxos}} compaction. > As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our > test, it significantly reduced the number of compaction without impacting the > latency too much: > !test11.png! > The time window for TWCS is set to 2 minutes for the test. > Here is the p99 latency impact: > !test2.png! > the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% > increase. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018533#comment-16018533 ] Blake Eggleston commented on CASSANDRA-13508: - Thinking about this a bit more, TWCS might be a better choice in some workloads. LCS should be the best choice when you have a table where the keys have regular activity on them over a long period of time. For workloads where you're just using paxos as a step in something like a user signup process though, where you're not likely to have multiple hits on the same user over time, TWCS could be a better choice. > Make system.paxos table compaction strategy configurable > > > Key: CASSANDRA-13508 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13508 > Project: Cassandra > Issue Type: Improvement >Reporter: Jay Zhuang >Assignee: Jay Zhuang > Fix For: 4.0, 4.x > > Attachments: test11.png, test2.png > > > The default compaction strategy for {{system.paxos}} table is LCS for > performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the > system is busy with {{system.paxos}} compaction. > As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our > test, it significantly reduced the number of compaction without impacting the > latency too much: > !test11.png! > The time window for TWCS is set to 2 minutes for the test. > Here is the p99 latency impact: > !test2.png! > the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% > increase. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018167#comment-16018167 ] sankalp kohli commented on CASSANDRA-13508: --- Your benchmark is on which version of C*? > Make system.paxos table compaction strategy configurable > > > Key: CASSANDRA-13508 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13508 > Project: Cassandra > Issue Type: Improvement >Reporter: Jay Zhuang >Assignee: Jay Zhuang > Fix For: 4.0, 4.x > > Attachments: test11.png, test2.png > > > The default compaction strategy for {{system.paxos}} table is LCS for > performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the > system is busy with {{system.paxos}} compaction. > As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our > test, it significantly reduced the number of compaction without impacting the > latency too much: > !test11.png! > The time window for TWCS is set to 2 minutes for the test. > Here is the p99 latency impact: > !test2.png! > the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% > increase. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018140#comment-16018140 ] Blake Eggleston commented on CASSANDRA-13508: - I agree, LCS is probably the better choice, which is the default in trunk. I think being able to tune the paxos table might not be a bad idea, given how heavily it can be used in some systems, but it also has some risks. First, supporting it won't be straightforward. System table schemas are hardcoded (see {{SystemKeyspace}}), so just allowing alter table statements against them isn't enough. Any changes you make will be lost after a node restart. Storing system table schemas as regular tables is also a non-starter. Any user configurable system properties are something that would have to be configured in cassandra.yaml or something, which for non-replicated tables, isn't terrible (and not the same thing as the schema.xml file used ca 0.6 this will remind people of). [~iamaleksey], do you have any thoughts on this? > Make system.paxos table compaction strategy configurable > > > Key: CASSANDRA-13508 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13508 > Project: Cassandra > Issue Type: Improvement >Reporter: Jay Zhuang >Assignee: Jay Zhuang > Fix For: 4.0, 4.x > > Attachments: test11.png, test2.png > > > The default compaction strategy for {{system.paxos}} table is LCS for > performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the > system is busy with {{system.paxos}} compaction. > As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our > test, it significantly reduced the number of compaction without impacting the > latency too much: > !test11.png! > The time window for TWCS is set to 2 minutes for the test. > Here is the p99 latency impact: > !test2.png! > the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% > increase. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016627#comment-16016627 ] Jeff Jirsa commented on CASSANDRA-13508: Also cc [~kohlisankalp] and [~bdeggleston] - both of whom have spent quite a bit of time thinking about paxos. I think LCS is better than TWCS, for read performance (if we have to go to disk), so I'm glad you've wont-fixed that. Tuning compaction (or compression, more likely, as dropping compression chunk size can be really helpful for read performance) may be useful, but I suspect your patch allows the altering of more than just properties - would also let users edit schema structure, which we should definitely prevent. > Make system.paxos table compaction strategy configurable > > > Key: CASSANDRA-13508 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13508 > Project: Cassandra > Issue Type: Improvement >Reporter: Jay Zhuang >Assignee: Jay Zhuang > Fix For: 4.0, 4.x > > Attachments: test11.png, test2.png > > > The default compaction strategy for {{system.paxos}} table is LCS for > performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the > system is busy with {{system.paxos}} compaction. > As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our > test, it significantly reduced the number of compaction without impacting the > latency too much: > !test11.png! > The time window for TWCS is set to 2 minutes for the test. > Here is the p99 latency impact: > !test2.png! > the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% > increase. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016611#comment-16016611 ] Jay Zhuang commented on CASSANDRA-13508: Hi [~jjirsa], what do you think about this? Paxos table could be very large. In our case, it's actually even larger than one user table. As all the CAS write data are stored in Paxos table with gc_grace TTL. If we have several tables using CAS write, the paxos table is very large. Tuning the compaction would be useful. This patch is giving the customer the option to change or configure the compaction strategy. Let me know if you have any suggestion. > Make system.paxos table compaction strategy configurable > > > Key: CASSANDRA-13508 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13508 > Project: Cassandra > Issue Type: Improvement >Reporter: Jay Zhuang >Assignee: Jay Zhuang > Fix For: 4.0, 4.x > > Attachments: test11.png, test2.png > > > The default compaction strategy for {{system.paxos}} table is LCS for > performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the > system is busy with {{system.paxos}} compaction. > As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our > test, it significantly reduced the number of compaction without impacting the > latency too much: > !test11.png! > The time window for TWCS is set to 2 minutes for the test. > Here is the p99 latency impact: > !test2.png! > the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% > increase. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13508) Make system.paxos table compaction strategy configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-13508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003305#comment-16003305 ] Jay Zhuang commented on CASSANDRA-13508: Please review. |Diff | [trunk|https://github.com/apache/cassandra/compare/trunk...cooldoger:13508-trunk?expand=1] | |patch | [13508-trunk.patch|https://github.com/apache/cassandra/commit/36d04b2bcfffcfa82b60de122ada04d9d8d2a245.patch] | > Make system.paxos table compaction strategy configurable > > > Key: CASSANDRA-13508 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13508 > Project: Cassandra > Issue Type: Improvement >Reporter: Jay Zhuang >Assignee: Jay Zhuang > Fix For: 4.0, 4.x > > Attachments: test11.png, test2.png > > > The default compaction strategy for {{system.paxos}} table is LCS for > performance reason: CASSANDRA-7753. But for CAS heavily used cluster, the > system is busy with {{system.paxos}} compaction. > As the data in {{paxos}} table are TTL'ed, TWCS might be a better fit. In our > test, it significantly reduced the number of compaction without impacting the > latency too much: > !test11.png! > The time window for TWCS is set to 2 minutes for the test. > Here is the p99 latency impact: > !test2.png! > the yellow one is LCS, the purple one is TWCS. Average p99 has about 10% > increase. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org