[jira] [Comment Edited] (CASSANDRA-14448) Improve the performance of CAS

2018-06-07 Thread Dikang Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505576#comment-16505576
 ] 

Dikang Gu edited comment on CASSANDRA-14448 at 6/8/18 1:01 AM:
---

An initial patch here: 
[trunk|https://github.com/DikangGu/cassandra/commit/da43bb7fc1336e8f2f7e04d84be1a44271eafba9].
 It introduces a new PAXOS_PREPARE_AND_READ verb, to allow in-place upgrade.

I also ran some tests in our internal test cluster, which across 5 different 
data centers in US, with 1 replica in each data center. The test client was 
doing 10K operations.

I test two types of use cases, non-contention and contention ones. For 
non-contention use case, each operation updates different key. For contention 
use case, there are 5 unique keys in total, each thread picks one to update. 

As the result, there is about +40% latency improvements for non-contention 
test. For contention test, there are some latency improvements as well, and the 
timeouts are much less as well.

 
*non-contention test*

| |C* 3.0.15, without patch| | | | | |fastpaxos, combine prepare and read 
together| |
|10K CAS|sync commit| | |async commit| | |sync commit| | |async commit| |
| |1 thread|5 threads|10 threads|1 thread|5 threads|10 threads|1 thread|5 
threads|10 threads|1 thread|5 threads|10 threads| |
|Total Time 
(ms)|1915163|378791|185424|1464694|291807|140827|1422831|284674|142433|948230|187583|94047|
 |
|mean|192.31|188.99|185.08|146.62|145.8|140.41|142.4|142.46|142.23|94.31|93.96|93.93|
 |
|P75|196.52|195.6|194.99|148.08|147.29|146.68|147.46|147.06|146.81|98.86|98.27|98.2|
 |
|P95|209.68|208.23|207.77|160.8|160.25|158.89|147.98|147.48|147.25|99.51|98.69|98.57|
 |
|P99|211.43|212.21|208.61|161.71|160.94|159.9|148.42|147.85|147.6|100.18|99.12|98.86|
 |
| | | | | | | | | | | | | | |

*contention test*

| |C* 3.0.15, without patch| | | | | |fastpaxos, combine prepare and read 
together| |
|10K CAS|sync commit| | |async commit| | |sync commit| | |async commit| |
| |1 thread|5 threads|10 threads|1 thread|5 threads|10 threads|1 thread|5 
threads|10 threads|1 thread|5 threads|10 threads| |
|Total Time 
(ms)|1886023|364048|478343|1462954|450799|481059|1432742|285417|330649|937701|305705|30|
 |
|mean|193.41|183.24|510.41|148.48|232.41|493.72|143.28|143.91|334.54|95.55|154.45|301.69|
 |
|P75|199.69|186.09|1008.11|150.5|150.49|1010.68|142.28|149.5|432.55|96.113|101.42|418.88|
 |
|P95|200.58|187.09|1160.3|151.03|1029.75|1167.68|151.19|150.27|1076.05|102.11|615.83|1039.38|
 |
|P99|201.17|189.71|1217.71|151.92|1146.28|1228.08|151.66|150.59|1141.52|102.65|1048.34|1129.77|
 |
|Timeouts|0|0|2093|0|443|2282|0|0|863|0|193|752| |
| | | | | | | | | | | | | | |

 


was (Author: dikanggu):
An initial patch here: 
[trunk|https://github.com/DikangGu/cassandra/commit/da43bb7fc1336e8f2f7e04d84be1a44271eafba9].
 It introduces a new PAXOS_PREPARE_AND_READ verb, to allow in-place upgrade.

I also ran some tests in our internal test cluster, which across 5 different 
data centers in US, with 1 replica in each data center. The test client was 
doing 10K operations.

I test two types of use cases, non-contention and contention ones. For 
non-contention use case, each operation updates different key. For contention 
use case, there are 5 unique keys in total, each thread picks one to update. 

As the result, there is about +40% latency improvements for non-contention 
test. For contention test, there are some latency improvements as well, and the 
timeouts are much less as well.

 
 non-contention test
 
| |C* 3.0.15, without patch|fastpaxos, combine prepare and read together| |
|10K CAS|sync commit|async commit|sync commit|async commit| |
| |1 thread|5 threads|10 threads|1 thread|5 threads|10 threads|1 thread|5 
threads|10 threads|1 thread|5 threads|10 threads| |
|Total Time 
(ms)|1915163|378791|185424|1464694|291807|140827|1422831|284674|142433|948230|187583|94047|
 |
|mean|192.31|188.99|185.08|146.62|145.8|140.41|142.4|142.46|142.23|94.31|93.96|93.93|
 |
|P75|196.52|195.6|194.99|148.08|147.29|146.68|147.46|147.06|146.81|98.86|98.27|98.2|
 |
|P95|209.68|208.23|207.77|160.8|160.25|158.89|147.98|147.48|147.25|99.51|98.69|98.57|
 |
|P99|211.43|212.21|208.61|161.71|160.94|159.9|148.42|147.85|147.6|100.18|99.12|98.86|
 |
| | | | | | | | | | | | | | |

contention test
 
| |C* 3.0.15, without patch|fastpaxos, combine prepare and read together| |
|10K CAS|sync commit|async commit|sync commit|async commit| |
| |1 thread|5 threads|10 threads|1 thread|5 threads|10 threads|1 thread|5 
threads|10 threads|1 thread|5 threads|10 threads| |
|Total Time 
(ms)|1886023|364048|478343|1462954|450799|481059|1432742|285417|330649|937701|305705|30|
 |
|mean|193.41|183.24|510.41|148.48|232.41|493.72|143.28|143.91|334.54|95.55|154.45|301.69|
 |
|P75|199.69|186.09|1008.11|150.5|150.49|1010.68|142.28|149.5|432.55|96.113|101.42|418.88|
 |

[jira] [Comment Edited] (CASSANDRA-14448) Improve the performance of CAS

2018-05-17 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479653#comment-16479653
 ] 

Dikang Gu edited comment on CASSANDRA-14448 at 5/17/18 8:27 PM:


Thanks everyone for reply!

It is a trade-off. I assume non-contended cases are the majority, and local 
read is cheaper than cross DC network requests. Under this assumption, it's a 
right (at least worth try) trade-off to me. For the contended case, as a 
potential improvement, we can make the replica skip the data read, if it 
already promised on a bigger ballot than the prepared one. But for other 
replicas, we will read the local data, which might be wasted, if the ballot is 
not accepted. 

For the async commit, currently it uses "_boolean shouldBlock = 
consistencyLevel != ConsistencyLevel.ANY"_ to decide whether to wait for Ack or 
not. I think my suggestion here is to emphasize the performance difference of 
using consistency level ANY or not, in the CAS operation.


was (Author: dikanggu):
Thanks everyone for reply!

It is a trade-off. I assume non-contended cases are the majority, and local 
read is cheaper than cross DC network requests. Under this assumption, it's a 
right (at least worth try) trade-off to me. For the contended case, as a 
potential improvement, we can make the replica skip the data read, if it 
already promised on a bigger ballot than the prepared one. But for other 
replicas, we will read the local data, which might be wasted, if the ballot is 
not accepted. 

For the async commit, currently it uses "_boolean shouldBlock = 
consistencyLevel != ConsistencyLevel.ANY"_ to decide to whether wait for Ack or 
not. I think my suggestion here is to emphasize the performance difference of 
using consistency level ANY or not, in the CAS operation.

> Improve the performance of CAS
> --
>
> Key: CASSANDRA-14448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14448
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> I'm working on some performance improvements of the lightweight transitions 
> (compare and set).
>  
> As you know, current CAS requires 4 round trips to finish, which is not 
> efficient, especially in cross DC case.
> 1) Prepare
> 2) Quorum read current value
> 3) Propose new value
> 4) Commit
>  
> I'm proposing the following improvements to reduce it to 2 round trips, which 
> is:
> 1) Combine prepare and quorum read together, use only one round trip to 
> decide the ballot and also piggyback the current value in response.
> 2) Propose new value, and then send out the commit request asynchronously, so 
> client will not wait for the ack of the commit. In case of commit failures, 
> we should still have chance to retry/repair it through hints or following 
> read/cas events.
>  
> After the improvement, we should be able to finish the CAS operation using 2 
> rounds trips. There can be following improvements as well, and this can be a 
> start point.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14448) Improve the performance of CAS

2018-05-17 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479653#comment-16479653
 ] 

Dikang Gu edited comment on CASSANDRA-14448 at 5/17/18 8:27 PM:


Thanks everyone for reply!

It is a trade-off. I assume non-contended cases are the majority, and local 
read is cheaper than cross DC network requests. Under this assumption, it's a 
right (at least worth try) trade-off to me. For the contended case, as a 
potential improvement, we can make the replica skip the data read, if it 
already promised on a bigger ballot than the prepared one. But for other 
replicas, we will read the local data, which might be wasted, if the ballot is 
not accepted. 

For the async commit, currently it uses "_boolean shouldBlock = 
consistencyLevel != ConsistencyLevel.ANY"_ to decide to whether wait for Ack or 
not. I think my suggestion here is to emphasize the performance difference of 
using consistency level ANY or not, in the CAS operation.


was (Author: dikanggu):
Thanks everyone for reply!

It is a trade-off. I assume non-contended cases are the majority, and local 
read is cheaper than cross DC network requests. Under this assumption, it's a 
right (at least worth try) trade-off to me. For the contended case, as a 
potential improvement, we can make the replica skip the data read, if it 
already promised on a bigger ballot than the prepared one. But for other 
replicas, we will read the local data, which might be wasted, if the ballot is 
not accepted. 

For the async commit, currently it uses "_boolean shouldBlock = 
consistencyLevel != ConsistencyLevel.ANY"_ to __ decide whether wait for Ack or 
not. I think my suggestion here is to emphasize the performance difference of 
using consistency level ANY or not, in the CAS operation.

> Improve the performance of CAS
> --
>
> Key: CASSANDRA-14448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14448
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> I'm working on some performance improvements of the lightweight transitions 
> (compare and set).
>  
> As you know, current CAS requires 4 round trips to finish, which is not 
> efficient, especially in cross DC case.
> 1) Prepare
> 2) Quorum read current value
> 3) Propose new value
> 4) Commit
>  
> I'm proposing the following improvements to reduce it to 2 round trips, which 
> is:
> 1) Combine prepare and quorum read together, use only one round trip to 
> decide the ballot and also piggyback the current value in response.
> 2) Propose new value, and then send out the commit request asynchronously, so 
> client will not wait for the ack of the commit. In case of commit failures, 
> we should still have chance to retry/repair it through hints or following 
> read/cas events.
>  
> After the improvement, we should be able to finish the CAS operation using 2 
> rounds trips. There can be following improvements as well, and this can be a 
> start point.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14448) Improve the performance of CAS

2018-05-17 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479498#comment-16479498
 ] 

Jeremiah Jordan edited comment on CASSANDRA-14448 at 5/17/18 6:18 PM:
--

I think it is a bad idea to break the fact that when a user tells us to write 
data at LOCAL_QUORUM in their consistency level, when the query has been 
acknowledged as successful the data is not actually guaranteed to be on a 
LOCAL_QUORUM of nodes.


was (Author: jjordan):
I think it is a bad idea to break the fact that when a user tells use to write 
data at LOCAL_QUORUM in their consistency level, when the query has been 
acknowledged as successful the data is not actually guaranteed to be on a 
LOCAL_QUORUM of nodes.

> Improve the performance of CAS
> --
>
> Key: CASSANDRA-14448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14448
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> I'm working on some performance improvements of the lightweight transitions 
> (compare and set).
>  
> As you know, current CAS requires 4 round trips to finish, which is not 
> efficient, especially in cross DC case.
> 1) Prepare
> 2) Quorum read current value
> 3) Propose new value
> 4) Commit
>  
> I'm proposing the following improvements to reduce it to 2 round trips, which 
> is:
> 1) Combine prepare and quorum read together, use only one round trip to 
> decide the ballot and also piggyback the current value in response.
> 2) Propose new value, and then send out the commit request asynchronously, so 
> client will not wait for the ack of the commit. In case of commit failures, 
> we should still have chance to retry/repair it through hints or following 
> read/cas events.
>  
> After the improvement, we should be able to finish the CAS operation using 2 
> rounds trips. There can be following improvements as well, and this can be a 
> start point.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14448) Improve the performance of CAS

2018-05-17 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479420#comment-16479420
 ] 

Jeremiah Jordan edited comment on CASSANDRA-14448 at 5/17/18 5:23 PM:
--

bq. How does asynchronous commit make any case worse

I want to read what I said to write at LOCAL_QUORUM, and I never do LWT again, 
so I don't use LOCAL_SERIAL in my reads.  In my experience this is a common 
pattern for people only using LWT for IF NOT EXISTS inserts.  Reads don't need 
to take the overhead of using LOCAL_SERIAL if you are only using LWT to insert 
new data when a row does not exist.


was (Author: jjordan):
bq. How does asynchronous commit make any case worse

I want to read what I said to write at LOCAL_QUORUM, and I never do LWT again, 
so I don't use LOCAL_SERIAL in my reads.  In my experience this is a common 
patter for people only using LWT for IF NOT EXISTS inserts.  Reads don't need 
to take the overhead of using LOCAL_SERIAL if you are only using LWT to insert 
new data when a row does not exist.

> Improve the performance of CAS
> --
>
> Key: CASSANDRA-14448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14448
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> I'm working on some performance improvements of the lightweight transitions 
> (compare and set).
>  
> As you know, current CAS requires 4 round trips to finish, which is not 
> efficient, especially in cross DC case.
> 1) Prepare
> 2) Quorum read current value
> 3) Propose new value
> 4) Commit
>  
> I'm proposing the following improvements to reduce it to 2 round trips, which 
> is:
> 1) Combine prepare and quorum read together, use only one round trip to 
> decide the ballot and also piggyback the current value in response.
> 2) Propose new value, and then send out the commit request asynchronously, so 
> client will not wait for the ack of the commit. In case of commit failures, 
> we should still have chance to retry/repair it through hints or following 
> read/cas events.
>  
> After the improvement, we should be able to finish the CAS operation using 2 
> rounds trips. There can be following improvements as well, and this can be a 
> start point.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org