[ https://issues.apache.org/jira/browse/CASSANDRA-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-5667: -------------------------------------- Attachment: 5667.txt Patch attached to move contention retry into {{beginAndRepairPaxos}} and use max(current time from system clock, inProgress + 1) as the ballot. Also updates in_progress_ballot on commit if necessary to preserve the guarantee that we won't issue a promise for any ballot less than we've seen before. > Change timestamps used in CAS ballot proposals to be more resilient to clock > skew > --------------------------------------------------------------------------------- > > Key: CASSANDRA-5667 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5667 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 2.0 beta 1 > Environment: n/a > Reporter: Nick Puz > Assignee: Jonathan Ellis > Priority: Minor > Fix For: 2.0 beta 1 > > Attachments: 5667.txt > > > The current time is used to generate the timeuuid used for CAS ballots > proposals with the logic that if a newer proposal exists then the current one > needs to complete that and re-propose. The problem is that if a machine has > clock skew and drifts into the future it will propose with a large timestamp > (which will get accepted) but then subsequent proposals with lower (but > correct) timestamps will not be able to proceed. This will prevent CAS write > operations and also reads at serializable consistency level. > The work around is to initially propose with current time (current behavior) > but if the proposal fails due to a larger existing one re-propose (after > completing the existing if necessary) with the max of (currentTime, > mostRecent+1, proposed+1). > Since small drift is normal between different nodes in the same datacenter > this can happen even if NTP is working properly and a write hits one node and > a subsequent serialized read hits another. In the case of NTP config issues > (or OS bugs with time esp around DST) the unavailability window could be much > larger. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira