[ https://issues.apache.org/jira/browse/CASSANDRA-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976949#comment-13976949 ]
Sylvain Lebresne commented on CASSANDRA-6106: --------------------------------------------- I'd like to summarize my understanding of what we're trying to fix here. As far as conflict resolution goes, microsecond resolution is imo rather useless. Given the accuracy of ntp, network latencies and whatnot, no application should ever rely on sub-milliseconds resolution for conflicts, and any application that rely on fine-grained ordering of updates to a cell should really provide client-side timestamp. It doesn't mean we can't use microsecond resolution if it's easy of course, but does mean that imo the bar on what complexity is worth it is rather low. This was not the original motivation of this ticket however. The original motivation was to limit the chance of 2 updates A and B getting the exact same timestamp, because when that happens, we could end up with some cell from A and some cell from B. I think we all agreed that the proper fix for that was more complicated and left to CASSANDRA-6123. Yet, as I said earlier, since that fix is much more complicated, I'm fine lowering the chances of timestamp conflicts in the meantime if that's easy for us (less often broken is somewhat better than more often broken, even if not broken is obviously better). But for this point, Christopher solution of randomizing the microseconds bits was actually really simple and probably good enough. And to be honest, Benedict's branch complexity is above what I consider reasonable for the concrete problem at hand. I'm surely not very smart, but it doesn't fit my own definition of straightforward. I'm not saying that it's the most complicated thing ever, but it's complicated enough to make me uncomfortable, given that even some simple rounding error on the timestamp could basically destroy user data. I'm also not convinced we need that complexity in practice. What about just having a thread call clock_gettime followed by nanoTime every second or so, and then just add the nano time between now and the last time clock_gettime was called to get the current time. It might not be perfect to get the most and best timestamp we can, but it's imo largely good enough for our purpose (and for clocks going back in time, we already handle that in a brute force kind of way in QueryState, which is again imo good enough). > Provide timestamp with true microsecond resolution > -------------------------------------------------- > > Key: CASSANDRA-6106 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6106 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: DSE Cassandra 3.1, but also HEAD > Reporter: Christopher Smith > Assignee: Benedict > Priority: Minor > Labels: timestamps > Fix For: 2.1 beta2 > > Attachments: microtimstamp.patch, microtimstamp_random.patch, > microtimstamp_random_rev2.patch > > > I noticed this blog post: http://aphyr.com/posts/294-call-me-maybe-cassandra > mentioned issues with millisecond rounding in timestamps and was able to > reproduce the issue. If I specify a timestamp in a mutating query, I get > microsecond precision, but if I don't, I get timestamps rounded to the > nearest millisecond, at least for my first query on a given connection, which > substantially increases the possibilities of collision. > I believe I found the offending code, though I am by no means sure this is > comprehensive. I think we probably need a fairly comprehensive replacement of > all uses of System.currentTimeMillis() with System.nanoTime(). -- This message was sent by Atlassian JIRA (v6.2#6252)