[ https://issues.apache.org/jira/browse/CASSANDRA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125709#comment-13125709 ]
Sylvain Lebresne commented on CASSANDRA-1034: --------------------------------------------- I'm not sure I'm am completely clear, so allow me to try to improve that. I think there is two issues with the last patch that are largely orthogonal: # the patch is broken (again, this is largely not related to the shoving of DK into Token) # I believe shoving DK into Token is a bad, error-prone idea that have no tangible advantages But let's me first focus on the first issue, if only because it involves no opinion whatsoever: I'm either right or wrong that it's broken (but i'm pretty sure I'm right). So let's be concrete and take an example. The patch "merges" Token and DecoratedKey together, so let me take the following notation: * t(1, a) for what is the DecoratedKey a of token 1 in current but is is just a Token instance in the patch * t(1) for the 'pure' token 1. In other word it's a shortcut for t(1, EMPTY_BYTE_BUFFER) in the attached patch and correspond to just a Token in the current code. (as a side note, the fact that I have to speak of DecoratedKey and 'pure' token to explain is imo yet another sign than melting everything into Token is a bad idea but I'm diverging) Since when Token are compared, the token is compared then the key is on token equality, we have t\(n) < t(n, k) whatever the token n and key k are (since t\(n) is t(n, EMPTY_BYTE_BUFFER) and EMPTY_BYTE_BUFFER < k for any valid key k) . Let's now take an example of multiple keys having the same token and say that I have the following keys in my cluster: {noformat} tokens | 1 | 2 | 3 keys | a | b | c | d | e | f | g {noformat} In other words, a and b have the same token 1; c, d and e have the same token 2; ... The goal for this ticket is to support that situation correctly. Sor for instance, we should have that: range_slice(start=t(1), end=t(3)) returns c, d, e, f and g (because range_slice with tokens is start exclusive). However, with the attached patch: range_slice(start=t(1), end=t(3)) will return a, b, c, d and e The reason is fairly simple: we have that t(1) < t(1, k) for any k and t(3) < t(3, k) for any k. Another way to put it is that it breaks our token ranges: if you have a node that owns Range(t(1), t(3)), it's supposed to not contains any key with token 1 and all keys with token 3, but it fails at both. So it's broken. Now there is something we could be tempted to do. We could make it so that t\(n) > t(n, k) for any token n and any key k. But in turn that would break Bounds (i.e, start inclusive) of 'pure' tokens. I.e, Bounds(t(1), t(2)) is supposed to include all keys with token 1, but if t(1) > t(1, k) for any key k, it won't include it. One could argue however that this is still solution because I *think* that right now we never really use a Bounds of 'pure' tokens (more precisely, the current code does it, but only in place where we are actually doing a range slice between keys). And I *think* that functions that take a startKey, when fed a 'pure' token only do start exclusive. So I suppose we could assert that we never create Bounds of Token and put some assert here and there (in SSTableReader.getPosition() for instance) and go with that. But imho this is a bad idea. Because it's fragile and because this is ignoring a problem that may screw us up later. Why not fix it the right way now? What if tomorrow we do want to be able to query all the keys having a given token ? That is, what if we want to query [t(1), t(1)] ? We would not be able to, because if t(1) > t(1, k) for any k, then [t(1), t(1)] don't include anything. Again, all this is because a token actually correspond to a set of keys (once you acknowledge multiple keys can have the same token), and so if you want to do things correctly, you need for a given token n to have a representation for both: * the smallest key having token n * the greatest key having token n With that, you can query all the keys having token n. Without, you can't. That is what my patch does and I believe fairly strongly is the right way to do it. Alright, that the first thing that a patch to this ticket must deal with. Then there is another thing: the current code only allow for AbstractBounds of Token (by typing), but we want for this patch that once you do a range_slice query with at startKey and endKey, you get a range of keys in ColumnFamilyStore.getRangeSlice(), so that you can precisely answer those queries. That means we must be able to construct AbstractBounds with keys in them. Note that it's basically just a typing problem. The answer to that of this patch is show DK into Token. So yeah, it fixes that problem, but what I'm arguing is that: * It's error-prone and make coding *more* complicated. We're merging object that are not the same thing. Today if a methods takes a Token, you know it won't do anything at the granularity of keys (well today Token and keys have the same granularity but this ticket is supposed to change that). You lose that information if you merge DK and Token. And if a method takes a DecoratedKey, you know that it doesn't expect a Token (more precisely, Typing ensures it). Sure, we do already use a trick in a few places where we create 'fake' DK(null, key). But at the very least, when we do that, we know we're doing something weird, and we are extra careful that methods we call on that fake DK handle that case correctly. If everything is Token, now the signature for a lot of method will suggest it is ok to give a 'pure' Token. So what, all method should defensively assert this is not the case ? This is what types are for. * It's a ~300K patch. Sure it's mostly simple changes, but it's still that many changes that could introduce a typo somewhere that causes a bug. * It's a tad less efficient because each time we really only care about 'pure' Token (and there is quite a bunch of code that does that), we would allocate a slightly larger structure for no reason. And I'm pretty sure god kills a kitten each time you do that. The solution to that very same type problem I'm proposing (in my patch) is instead simply to generalize AbstractBound slightly so you can have both AbstractBound of Token and of DecoratedKey. That sound very reasonable to me. After all we should be able to have AbstractBounds of anything that implements Comparable right ? Well, as it turns out our implementation of AbstractBound needs a little more than that (because our ranges wraps, we need Comparable but with a minimum value for instance) and that is what RingPosition is for. But it's only a marker interface, and if you look at the code it's actually used in a small number of places, so I admit I fail to see how this make thing much more complicated. > Remove assumption that Key to Token is one-to-one > ------------------------------------------------- > > Key: CASSANDRA-1034 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1034 > Project: Cassandra > Issue Type: Bug > Reporter: Stu Hood > Assignee: Pavel Yaskevich > Priority: Minor > Fix For: 1.1 > > Attachments: > 0001-Make-range-accept-both-Token-and-DecoratedKey.patch, > 0002-LengthPartitioner.patch, 1034-1-Generify-AbstractBounds-v3.patch, > 1034-2-Remove-assumption-that-token-and-keys-are-one-to-one-v3.patch, > 1034_v1.txt, CASSANDRA-1034.patch > > > get_range_slices assumes that Tokens do not collide and converts a KeyRange > to an AbstractBounds. For RandomPartitioner, this assumption isn't safe, and > would lead to a very weird heisenberg. > Converting AbstractBounds to use a DecoratedKey would solve this, because the > byte[] key portion of the DecoratedKey can act as a tiebreaker. > Alternatively, we could make DecoratedKey extend Token, and then use > DecoratedKeys in places where collisions are unacceptable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira