[ 
https://issues.apache.org/jira/browse/CASSANDRA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125709#comment-13125709
 ] 

Sylvain Lebresne commented on CASSANDRA-1034:
---------------------------------------------

I'm not sure I'm am completely clear, so allow me to try to improve that. I 
think there is two issues with the last patch that are largely orthogonal:
  # the patch is broken (again, this is largely not related to the shoving of 
DK into Token)
  # I believe shoving DK into Token is a bad, error-prone idea that have no 
tangible advantages

But let's me first focus on the first issue, if only because it involves no 
opinion whatsoever: I'm either right or wrong that it's broken (but i'm pretty 
sure I'm right). So let's be concrete and take an example.

The patch "merges" Token and DecoratedKey together, so let me take the 
following notation:
  * t(1, a) for what is the DecoratedKey a of token 1 in current but is is just 
a Token instance in the patch
  * t(1) for the 'pure' token 1. In other word it's a shortcut for t(1, 
EMPTY_BYTE_BUFFER) in the attached patch and correspond to just a Token in the 
current code.
(as a side note, the fact that I have to speak of DecoratedKey and 'pure' token 
to explain is imo yet another sign than melting everything into Token is a bad 
idea but I'm diverging)

Since when Token are compared, the token is compared then the key is on token 
equality, we have t\(n) < t(n, k) whatever the token n and key k are (since 
t\(n) is t(n, EMPTY_BYTE_BUFFER) and EMPTY_BYTE_BUFFER < k for any valid key k) 
.

Let's now take an example of multiple keys having the same token and say that I 
have the following keys in my cluster:
{noformat}
tokens |   1   |     2     |   3
keys   | a | b | c | d | e | f | g
{noformat}
In other words, a and b have the same token 1; c, d and e have the same token 
2; ...

The goal for this ticket is to support that situation correctly. Sor for 
instance, we should have that:
   range_slice(start=t(1), end=t(3)) returns c, d, e, f and g
(because range_slice with tokens is start exclusive).  However, with the 
attached patch:
   range_slice(start=t(1), end=t(3)) will return a, b, c, d and e

The reason is fairly simple: we have that t(1) < t(1, k) for any k and t(3) < 
t(3, k) for any k.

Another way to put it is that it breaks our token ranges: if you have a node 
that owns Range(t(1), t(3)), it's supposed to not contains any key with token 1 
and all keys with token 3, but it fails at both.

So it's broken. Now there is something we could be tempted to do. We could make 
it so that t\(n) > t(n, k) for any token n and any key k. But in turn that 
would break Bounds (i.e, start inclusive) of 'pure' tokens. I.e, Bounds(t(1), 
t(2)) is supposed to include all keys with token 1, but if t(1) > t(1, k) for 
any key k, it won't include it.

One could argue however that this is still solution because I *think* that 
right now we never really use a Bounds of 'pure' tokens (more precisely, the 
current code does it, but only in place where we are actually doing a range 
slice between keys). And I *think* that functions that take a startKey, when 
fed a 'pure' token only do start exclusive. So I suppose we could assert that 
we never create Bounds of Token and put some assert here and there (in 
SSTableReader.getPosition() for instance) and go with that.

But imho this is a bad idea. Because it's fragile and because this is ignoring 
a problem that may screw us up later. Why not fix it the right way now? What if 
tomorrow we do want to be able to query all the keys having a given token ? 
That is, what if we want to query [t(1), t(1)] ? We would not be able to, 
because if t(1) > t(1, k) for any k, then [t(1), t(1)] don't include anything.

Again, all this is because a token actually correspond to a set of keys (once 
you acknowledge multiple keys can have the same token), and so if you want to 
do things correctly, you need for a given token n to have a representation for 
both:
  * the smallest key having token n
  * the greatest key having token n

With that, you can query all the keys having token n. Without, you can't. That 
is what my patch does and I believe fairly strongly is the right way to do it.


Alright, that the first thing that a patch to this ticket must deal with. Then 
there is another thing: the current code only allow for AbstractBounds of Token 
(by typing), but we want for this patch that once you do a range_slice query 
with at startKey and endKey, you get a range of keys in 
ColumnFamilyStore.getRangeSlice(), so that you can precisely answer those 
queries. That means we must be able to construct AbstractBounds with keys in 
them. Note that it's basically just a typing problem.

The answer to that of this patch is show DK into Token. So yeah, it fixes that 
problem, but what I'm arguing is that:
  * It's error-prone and make coding *more* complicated. We're merging object 
that are not the same thing. Today if a methods takes a Token, you know it 
won't do anything at the granularity of keys (well today Token and keys have 
the same granularity but this ticket is supposed to change that). You lose that 
information if you merge DK and Token. And if a method takes a DecoratedKey, 
you know that it doesn't expect a Token (more precisely, Typing ensures it). 
Sure, we do already use a trick in a few places where we create 'fake' DK(null, 
key). But at the very least, when we do that, we know we're doing something 
weird, and we are extra careful that methods we call on that fake DK handle 
that case correctly. If everything is Token, now the signature for a lot of 
method will suggest it is ok to give a 'pure' Token. So what, all method should 
defensively assert this is not the case ? This is what types are for.
  * It's a ~300K patch. Sure it's mostly simple changes, but it's still that 
many changes that could introduce a typo somewhere that causes a bug.
  * It's a tad less efficient because each time we really only care about 
'pure' Token (and there is quite a bunch of code that does that), we would 
allocate a slightly larger structure for no reason. And I'm pretty sure god 
kills a kitten each time you do that.

The solution to that very same type problem I'm proposing (in my patch) is 
instead simply to generalize AbstractBound slightly so you can have both 
AbstractBound of Token and of DecoratedKey. That sound very reasonable to me.  
After all we should be able to have AbstractBounds of anything that implements 
Comparable right ? Well, as it turns out our implementation of AbstractBound 
needs a little more than that (because our ranges wraps, we need Comparable but 
with a minimum value for instance) and that is what RingPosition is for.  But 
it's only a marker interface, and if you look at the code it's actually used in 
a small number of places, so I admit I fail to see how this make thing much 
more complicated.
                
> Remove assumption that Key to Token is one-to-one
> -------------------------------------------------
>
>                 Key: CASSANDRA-1034
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1034
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 
> 0001-Make-range-accept-both-Token-and-DecoratedKey.patch, 
> 0002-LengthPartitioner.patch, 1034-1-Generify-AbstractBounds-v3.patch, 
> 1034-2-Remove-assumption-that-token-and-keys-are-one-to-one-v3.patch, 
> 1034_v1.txt, CASSANDRA-1034.patch
>
>
> get_range_slices assumes that Tokens do not collide and converts a KeyRange 
> to an AbstractBounds. For RandomPartitioner, this assumption isn't safe, and 
> would lead to a very weird heisenberg.
> Converting AbstractBounds to use a DecoratedKey would solve this, because the 
> byte[] key portion of the DecoratedKey can act as a tiebreaker. 
> Alternatively, we could make DecoratedKey extend Token, and then use 
> DecoratedKeys in places where collisions are unacceptable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to