[ https://issues.apache.org/jira/browse/CASSANDRA-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040168#comment-14040168 ]
Benedict commented on CASSANDRA-6108: ------------------------------------- Is this approach inherently incompatible with client-provided-timestamps? As far as replacing timestamps are concerned, anyway; not necessarily as a datatype. I think solving this problem properly is going to be very challenging, but I'd like to propose the following (rough sketch) of a solution. Note that this doesn't solve timeid64, as much as mostly-unique cluster-wide timestamps in 64-bits or less that can be generated by the client: # I propose each client auto-generates a 20-bit id on startup. We can try to make this guaranteed unique, but I think a random number is probably sufficient. # We define rolling epochs, each ~6 days apart, which is ~half the addressable ms interval in 32-bits, i.e. given any full ms time we split into its most recent epoch plus its delta from that epoch. # Each client then produces a timestamp that is 32-bits of current time (in millis) since the most recent epoch, a local monotonically increasing 14-bit value that is reset each ms, and their unique id On the cluster we ensure memtables are flushed at least once per epoch, with the epoch appearing in the metadata, and we consider a full timestamp to be a composite of the timestamp stored combined with the epoch. Once the data is fully repaired prior to an epoch we can optionally save 32-bits per cell by stripping out the per-node and monotonically increasing timestamp values on compaction. The added complexity, as far as I can tell, will be in repairs, hints and compaction which need to ensure they compare a 96-bit timestamp instead of a 64-bit one. But in compaction at least this might actually simplify matters, as reconcile knows in advance which sstables it prefers data from. It's a pretty non-trivial change, and needs some further thought, but I think only non-trivial solutions are probably going to work for this non-trivial problem. Some possible safety optimisations with this solution might include refusing client timestamps that are not within some sensible skew from now, e.g. within 1 day, or 1 hour, giving a high degree of confidence the cluster is sufficiently in sync, since old timestamps should only appear during client retries, which should not be so badly delayed. We could also move to micros time if some users require it with this solution (which no doubt some will), with narrower epochs. > Create timeid64 type > -------------------- > > Key: CASSANDRA-6108 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6108 > Project: Cassandra > Issue Type: New Feature > Components: API, Core > Reporter: Jonathan Ellis > Assignee: Sylvain Lebresne > Priority: Minor > Fix For: 2.1.1 > > > As discussed in CASSANDRA-6106, we could create a 64-bit type with 48 bits of > timestamp and 16 bites of unique coordinator id. This would give us a > unique-per-cluster value that could be used as a more compact replacement for > many TimeUUID uses. -- This message was sent by Atlassian JIRA (v6.2#6252)