[ 
https://issues.apache.org/jira/browse/CASSANDRA-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-4383:
----------------------------------------

    Attachment: 4383-v1.txt

Attached is a first draft.  Some notes:

* we have to keep the first token as text for backward-compatibility, but after 
that we can encode the rest in binary.

* simply using binary is good enough, since CASSANDRA-4139 will give us varint 
encoding for free, and beyond that CASSANDRA-3127 gives us compression.

I say first draft, because while everything works with this patch, during the 
implementation I found some things that, while perhaps beyond the original 
scope of this ticket, are things we should probably address:

* the logic for serialization and deserialization is split between VV and SS, 
which is kind of ugly, and we probably need to deser for gossipinfo to both 
make it useful and avoid annoying flashing the terminal.  These are things I 
could have done, though, I just haven't yet.

* The game of appending things to STATUS and carefully splitting to avoid 
accidentally tripping over the VV delimiter is both something I'd like to stop 
doing, and slightly dangerous.

* Since VV uses strings, we have to use the latin-1 codepage to pass the binary 
tokens to avoid having any bytes eaten by string encoding.  This is a bit 
hackish.

To solve the STATUS and pieces[] problem, I suggest we stop appending things to 
it right now.  Currently LEAVING is the one-off where HOST_ID is NOT included, 
and there's nothing we can do about that while maintaining compatibility.  So 
what I suggest is we make  that the norm, and promote HOST_ID to a new 
ApplicationState, which will simplify the "do I need to look for a hostId?" 
checks since the state will be guaranteed to be there for new-style nodes.  
Similarly, I think we should promote the serialized tokens to a TOKENS 
ApplicationState, so we can stop deftly avoiding tripping over our string 
delimiter there.  Old-style nodes will still do the the split on STATUS and 
we'll keep putting the first token there for that, but new-style nodes can 
process TOKENS directly and safely.

Finally, to avoid the latin-1 hack, we should probably think about converting 
VV to accepting and writing bytes directly.

Thoughts?
                
> Binary encoding of vnode tokens
> -------------------------------
>
>                 Key: CASSANDRA-4383
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4383
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 1.2
>
>         Attachments: 4383-v1.txt
>
>
> Since after CASSANDRA-4317 we can know which version a remote node is using 
> (that is, whether it is vnode-aware or not) this a good opportunity to change 
> the token encoding to binary, since with a default of 256 tokens per node 
> even a fixed-length 16 byte encoding per token provides a great deal of 
> savings in gossip traffic over a text representation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to