[jira] [Commented] (CASSANDRA-11991) On clock skew, paxos may "corrupt" the node clock

2016-06-22 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345678#comment-15345678
 ] 

sankalp kohli commented on CASSANDRA-11991:
---

+1 [~slebresne] Please commit it. 

> On clock skew, paxos may "corrupt" the node clock
> -
>
> Key: CASSANDRA-11991
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11991
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> W made a mistake in CASSANDRA-9649 so that a temporal clock skew on one node 
> can "corrupt" other node clocks through Paxos. That wasn't intended and we 
> should fix that. I'll attach a patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11991) On clock skew, paxos may "corrupt" the node clock

2016-06-22 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344227#comment-15344227
 ] 

Jason Brown commented on CASSANDRA-11991:
-

bq.  I think having our CQL clock strictly monotonic per-node (rather than 
per-connection) is cleaner and less surprising to people.

ok, I'll buy that.

Otherwise, +1 on all branches. I checked out the test results, and the ones 
that failed were either 1) unrelated to this CAS change or 2) test timeouts 
(and still unrelated to CAS).

> On clock skew, paxos may "corrupt" the node clock
> -
>
> Key: CASSANDRA-11991
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11991
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> W made a mistake in CASSANDRA-9649 so that a temporal clock skew on one node 
> can "corrupt" other node clocks through Paxos. That wasn't intended and we 
> should fix that. I'll attach a patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11991) On clock skew, paxos may "corrupt" the node clock

2016-06-22 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343886#comment-15343886
 ] 

Sylvain Lebresne commented on CASSANDRA-11991:
--

Also, I've merged the patch up (no conflict whatsoever) and started CI on all 
branches:

|| version || utests || dtests||
| [2.1|https://github.com/pcmanus/cassandra/commits/11991-2.1] | 
[utests|http://cassci.datastax.com/job/pcmanus-11991-2.1-testall/] | 
[dtests|http://cassci.datastax.com/job/pcmanus-11991-2.1-dtest/] |
| [2.2|https://github.com/pcmanus/cassandra/commits/11991-2.2] | 
[utests|http://cassci.datastax.com/job/pcmanus-11991-2.2-testall/] | 
[dtests|http://cassci.datastax.com/job/pcmanus-11991-2.2-dtest/]|
| [3.0|https://github.com/pcmanus/cassandra/commits/11991-3.0] | 
[utests|http://cassci.datastax.com/job/pcmanus-11991-3.0-testall/] | 
[dtests|http://cassci.datastax.com/job/pcmanus-11991-3.0-dtest/]|
| [trunk|https://github.com/pcmanus/cassandra/commits/11991-trunk] | 
[utests|http://cassci.datastax.com/job/pcmanus-11991-trunk-testall/] | 
[dtests|http://cassci.datastax.com/job/pcmanus-11991-trunk-dtest/]|

I also updated the comment on top of {{ClientState#lastTimestampMicros}} since 
it wasn't complete on the real motivation.


> On clock skew, paxos may "corrupt" the node clock
> -
>
> Key: CASSANDRA-11991
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11991
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> W made a mistake in CASSANDRA-9649 so that a temporal clock skew on one node 
> can "corrupt" other node clocks through Paxos. That wasn't intended and we 
> should fix that. I'll attach a patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11991) On clock skew, paxos may "corrupt" the node clock

2016-06-22 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343865#comment-15343865
 ] 

Sylvain Lebresne commented on CASSANDRA-11991:
--

bq. The only minor nit I have is CASSANDRA-9649 made 
ClientState#lastTimestampMicros a static field.  believe this change helped 
accelerate the cluster get into a bad state wrt the propagation of bad 
timestamps. wdyt about switching it back to being an instance field (not 
static)?

There is really 2 reasons I made it static:
# CASSANDRA-7801: it's not a huge thing, but it helps user being less confused.
# because I feel that having it not static was a mistake in the first place. 
That is, even if we completely forget about Paxos, I think having our CQL clock 
strictly monotonic per-node (rather than per-connection) is cleaner and less 
surprising to people. So I'm not a fan of getting back to the older behavior, 
and doing so could be considered a breaking change (easier to give new 
guarantees than give some away).

I don't disagree that fact made the consequences of this bug worst, but I don't 
think removing the {{static}} is minor at all. And that code feel easy enough 
to convince oneself that we're not modifying {{lastTimestampMicros}} in bad 
ways anymore, so hopefully we won't make that mistake anymore.

Besides, if you're smart, you should be switching to client generated 
timestamps anyway :)

> On clock skew, paxos may "corrupt" the node clock
> -
>
> Key: CASSANDRA-11991
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11991
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> W made a mistake in CASSANDRA-9649 so that a temporal clock skew on one node 
> can "corrupt" other node clocks through Paxos. That wasn't intended and we 
> should fix that. I'll attach a patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11991) On clock skew, paxos may "corrupt" the node clock

2016-06-21 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342972#comment-15342972
 ] 

Jason Brown commented on CASSANDRA-11991:
-

lgtm. The only minor nit I have is CASSANDRA-9649 made 
{{ClientState#lastTimestampMicros}} a static field. I believe this change 
helped accelerate the cluster get into a bad state wrt the propagation of bad 
timestamps. wdyt about switching it back to being an instance field (not 
static)?

> On clock skew, paxos may "corrupt" the node clock
> -
>
> Key: CASSANDRA-11991
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11991
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> W made a mistake in CASSANDRA-9649 so that a temporal clock skew on one node 
> can "corrupt" other node clocks through Paxos. That wasn't intended and we 
> should fix that. I'll attach a patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11991) On clock skew, paxos may "corrupt" the node clock

2016-06-15 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332092#comment-15332092
 ] 

sankalp kohli commented on CASSANDRA-11991:
---

I can review the patch. We should try to get it in for 2.1.15 if possible 

> On clock skew, paxos may "corrupt" the node clock
> -
>
> Key: CASSANDRA-11991
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11991
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> W made a mistake in CASSANDRA-9649 so that a temporal clock skew on one node 
> can "corrupt" other node clocks through Paxos. That wasn't intended and we 
> should fix that. I'll attach a patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)