[jira] [Commented] (CASSANDRA-11991) On clock skew, paxos may "corrupt" the node clock
[ https://issues.apache.org/jira/browse/CASSANDRA-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345678#comment-15345678 ] sankalp kohli commented on CASSANDRA-11991: --- +1 [~slebresne] Please commit it. > On clock skew, paxos may "corrupt" the node clock > - > > Key: CASSANDRA-11991 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11991 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne > Fix For: 2.1.x, 2.2.x, 3.0.x > > > W made a mistake in CASSANDRA-9649 so that a temporal clock skew on one node > can "corrupt" other node clocks through Paxos. That wasn't intended and we > should fix that. I'll attach a patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11991) On clock skew, paxos may "corrupt" the node clock
[ https://issues.apache.org/jira/browse/CASSANDRA-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344227#comment-15344227 ] Jason Brown commented on CASSANDRA-11991: - bq. I think having our CQL clock strictly monotonic per-node (rather than per-connection) is cleaner and less surprising to people. ok, I'll buy that. Otherwise, +1 on all branches. I checked out the test results, and the ones that failed were either 1) unrelated to this CAS change or 2) test timeouts (and still unrelated to CAS). > On clock skew, paxos may "corrupt" the node clock > - > > Key: CASSANDRA-11991 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11991 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne > Fix For: 2.1.x, 2.2.x, 3.0.x > > > W made a mistake in CASSANDRA-9649 so that a temporal clock skew on one node > can "corrupt" other node clocks through Paxos. That wasn't intended and we > should fix that. I'll attach a patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11991) On clock skew, paxos may "corrupt" the node clock
[ https://issues.apache.org/jira/browse/CASSANDRA-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343886#comment-15343886 ] Sylvain Lebresne commented on CASSANDRA-11991: -- Also, I've merged the patch up (no conflict whatsoever) and started CI on all branches: || version || utests || dtests|| | [2.1|https://github.com/pcmanus/cassandra/commits/11991-2.1] | [utests|http://cassci.datastax.com/job/pcmanus-11991-2.1-testall/] | [dtests|http://cassci.datastax.com/job/pcmanus-11991-2.1-dtest/] | | [2.2|https://github.com/pcmanus/cassandra/commits/11991-2.2] | [utests|http://cassci.datastax.com/job/pcmanus-11991-2.2-testall/] | [dtests|http://cassci.datastax.com/job/pcmanus-11991-2.2-dtest/]| | [3.0|https://github.com/pcmanus/cassandra/commits/11991-3.0] | [utests|http://cassci.datastax.com/job/pcmanus-11991-3.0-testall/] | [dtests|http://cassci.datastax.com/job/pcmanus-11991-3.0-dtest/]| | [trunk|https://github.com/pcmanus/cassandra/commits/11991-trunk] | [utests|http://cassci.datastax.com/job/pcmanus-11991-trunk-testall/] | [dtests|http://cassci.datastax.com/job/pcmanus-11991-trunk-dtest/]| I also updated the comment on top of {{ClientState#lastTimestampMicros}} since it wasn't complete on the real motivation. > On clock skew, paxos may "corrupt" the node clock > - > > Key: CASSANDRA-11991 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11991 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne > Fix For: 2.1.x, 2.2.x, 3.0.x > > > W made a mistake in CASSANDRA-9649 so that a temporal clock skew on one node > can "corrupt" other node clocks through Paxos. That wasn't intended and we > should fix that. I'll attach a patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11991) On clock skew, paxos may "corrupt" the node clock
[ https://issues.apache.org/jira/browse/CASSANDRA-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343865#comment-15343865 ] Sylvain Lebresne commented on CASSANDRA-11991: -- bq. The only minor nit I have is CASSANDRA-9649 made ClientState#lastTimestampMicros a static field. believe this change helped accelerate the cluster get into a bad state wrt the propagation of bad timestamps. wdyt about switching it back to being an instance field (not static)? There is really 2 reasons I made it static: # CASSANDRA-7801: it's not a huge thing, but it helps user being less confused. # because I feel that having it not static was a mistake in the first place. That is, even if we completely forget about Paxos, I think having our CQL clock strictly monotonic per-node (rather than per-connection) is cleaner and less surprising to people. So I'm not a fan of getting back to the older behavior, and doing so could be considered a breaking change (easier to give new guarantees than give some away). I don't disagree that fact made the consequences of this bug worst, but I don't think removing the {{static}} is minor at all. And that code feel easy enough to convince oneself that we're not modifying {{lastTimestampMicros}} in bad ways anymore, so hopefully we won't make that mistake anymore. Besides, if you're smart, you should be switching to client generated timestamps anyway :) > On clock skew, paxos may "corrupt" the node clock > - > > Key: CASSANDRA-11991 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11991 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne > Fix For: 2.1.x, 2.2.x, 3.0.x > > > W made a mistake in CASSANDRA-9649 so that a temporal clock skew on one node > can "corrupt" other node clocks through Paxos. That wasn't intended and we > should fix that. I'll attach a patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11991) On clock skew, paxos may "corrupt" the node clock
[ https://issues.apache.org/jira/browse/CASSANDRA-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342972#comment-15342972 ] Jason Brown commented on CASSANDRA-11991: - lgtm. The only minor nit I have is CASSANDRA-9649 made {{ClientState#lastTimestampMicros}} a static field. I believe this change helped accelerate the cluster get into a bad state wrt the propagation of bad timestamps. wdyt about switching it back to being an instance field (not static)? > On clock skew, paxos may "corrupt" the node clock > - > > Key: CASSANDRA-11991 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11991 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne > Fix For: 2.1.x, 2.2.x, 3.0.x > > > W made a mistake in CASSANDRA-9649 so that a temporal clock skew on one node > can "corrupt" other node clocks through Paxos. That wasn't intended and we > should fix that. I'll attach a patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11991) On clock skew, paxos may "corrupt" the node clock
[ https://issues.apache.org/jira/browse/CASSANDRA-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332092#comment-15332092 ] sankalp kohli commented on CASSANDRA-11991: --- I can review the patch. We should try to get it in for 2.1.15 if possible > On clock skew, paxos may "corrupt" the node clock > - > > Key: CASSANDRA-11991 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11991 > Project: Cassandra > Issue Type: Bug >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne > Fix For: 2.1.x, 2.2.x, 3.0.x > > > W made a mistake in CASSANDRA-9649 so that a temporal clock skew on one node > can "corrupt" other node clocks through Paxos. That wasn't intended and we > should fix that. I'll attach a patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)