[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589470#comment-14589470
 ] 

Benedict commented on CASSANDRA-9499:
-------------------------------------

I'm confused as to why we need 10 bytes? Pretty much by definition a 
continuation bit encoding needs 9 bytes to represent 8 bytes. Let's not use 
Google's implementation. It looks pretty rubbish. 

The reason they use 10 bytes is they cannot be bothered to realise the last 
byte does not need a continuation bit. They also use a *terrible* 
implementation for deciding how long it needs to be.

Here's  a simple change which makes it 9 bytes, and easily optimised: the 
continuation bits are all shifted to the first byte, which effectively encodes 
the length in run-length encoding, i.e. the number of contiguous top order bits 
that are set to 1. i.e. {{length = Integer.numberOfLeadingZeros(firstByte ^ 
(byte) -1)}}

This way we read the first byte, and if there are any more to read, we read a 
long, and quickly truncate.

> Introduce writeVInt method to DataOutputStreamPlus
> --------------------------------------------------
>
>                 Key: CASSANDRA-9499
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Ariel Weisberg
>            Priority: Minor
>             Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to