[ 
https://issues.apache.org/jira/browse/CASSANDRA-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976949#comment-13976949
 ] 

Sylvain Lebresne commented on CASSANDRA-6106:
---------------------------------------------

I'd like to summarize my understanding of what we're trying to fix here.

As far as conflict resolution goes, microsecond resolution is imo rather 
useless. Given the accuracy of ntp, network latencies and whatnot, no 
application should ever rely on sub-milliseconds resolution for conflicts, and 
any application that rely on fine-grained ordering of updates to a cell should 
really provide client-side timestamp. It doesn't mean we can't use microsecond 
resolution if it's easy of course, but does mean that imo the bar on what 
complexity is worth it is rather low.

This was not the original motivation of this ticket however. The original 
motivation was to limit the chance of 2 updates A and B getting the exact same 
timestamp, because when that happens, we could end up with some cell from A and 
some cell from B. I think we all agreed that the proper fix for that was more 
complicated and left to CASSANDRA-6123. Yet, as I said earlier, since that fix 
is much more complicated, I'm fine lowering the chances of timestamp conflicts 
in the meantime if that's easy for us (less often broken is somewhat better 
than more often broken, even if not broken is obviously better). But for this 
point, Christopher solution of randomizing the microseconds bits was actually 
really simple and probably good enough.

And to be honest, Benedict's branch complexity is above what I consider 
reasonable for the concrete problem at hand. I'm surely not very smart, but it 
doesn't fit my own definition of straightforward. I'm not saying that it's the 
most complicated thing ever, but it's complicated enough to make me 
uncomfortable, given that even some simple rounding error on the timestamp 
could basically destroy user data.

I'm also not convinced we need that complexity in practice. What about just 
having a thread call clock_gettime followed by nanoTime every second or so, and 
then just add the nano time between now and the last time clock_gettime was 
called to get the current time. It might not be perfect to get the most and 
best timestamp we can, but it's imo largely good enough for our purpose (and 
for clocks going back in time, we already handle that in a brute force kind of 
way in QueryState, which is again imo good enough).

> Provide timestamp with true microsecond resolution
> --------------------------------------------------
>
>                 Key: CASSANDRA-6106
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6106
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: DSE Cassandra 3.1, but also HEAD
>            Reporter: Christopher Smith
>            Assignee: Benedict
>            Priority: Minor
>              Labels: timestamps
>             Fix For: 2.1 beta2
>
>         Attachments: microtimstamp.patch, microtimstamp_random.patch, 
> microtimstamp_random_rev2.patch
>
>
> I noticed this blog post: http://aphyr.com/posts/294-call-me-maybe-cassandra 
> mentioned issues with millisecond rounding in timestamps and was able to 
> reproduce the issue. If I specify a timestamp in a mutating query, I get 
> microsecond precision, but if I don't, I get timestamps rounded to the 
> nearest millisecond, at least for my first query on a given connection, which 
> substantially increases the possibilities of collision.
> I believe I found the offending code, though I am by no means sure this is 
> comprehensive. I think we probably need a fairly comprehensive replacement of 
> all uses of System.currentTimeMillis() with System.nanoTime().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to