[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974330#comment-14974330 ]
Sylvain Lebresne commented on CASSANDRA-9328: --------------------------------------------- bq. you shouldn't retry WTEs You'll note I didn't mention the term "retry", because in the case of LWT, "handling" WTEs is indeed almost always more involved than a simple retry (since as you point out, LWT update won't be idempotent). But "more involved" does not equate "cannot ever be dealt with". Your application will indeed most of the time have to do a read on a WTE to figure out what our state is and what you should do. And that does mean you have to model things in a way that allow such recovery on WTE. bq. If CAS is not atomic Not sure where that hypothesis comes from. CAS is atomic: either all of it will be applied or none of it will. It just happens that there is some situation where you, the client, won't know which one that is. That property is btw not at all specific to Cassandra: take any transaction in any SQL database, if your server dies during the request, your client just won't know whether it's been applied or not. Don't get me wrong, as said earlier, the fact that we throw WTE much more often than we really should is a shame: it's a potentially big performance penalty in particular. But that doesn't mean CAS isn't atomic, nor that you can't use for non idempotent operations. Now, I'm happy to have debate on our CAS implementation and its limitations, but the mailing is probably a better venue for that. Regarding this ticket (the fact that WTE can be thrown early when there is contention), as I said in a previous comment, no-one has so far come up with any idea for how to fix it, so I'll close this as "won't fix" which in that case mean: this is a known limitation for which we have no short term fix. Feel free to re-open if you have a solution to offer. Hopefully, on the long term, moving to EPaxos (CASSANDRA-6246) might make that better. > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms > ------------------------------------------------------------------------------------------------------------------------------------ > > Key: CASSANDRA-9328 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Aaron Whiteside > Priority: Critical > Fix For: 2.1.x > > Attachments: CassandraLWTTest.java, CassandraLWTTest2.java > > > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms. > Unit test attached, run against a 3 node cluster running 2.1.5. > If you reduce the threadCount to 1, you never see a WriteTimeoutException. If > the WTE is due to not being able to communicate with other nodes, why does > the concurrency >1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)