[ 
https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587908#comment-14587908
 ] 

Benedict commented on CASSANDRA-9499:
-------------------------------------

bq. I assumed that knowing a given integer can't be negative could lead to more 
efficient encoding

Well, we could have two different encodings, but with the current scheme this 
would mostly help values in the range of [128..250). I'm not sure if that's 
worth confusing everything for.

However if we change the encoding, we can bias towards positive encodings, 
since they're more common. I'm somewhat inclined to use a hybrid extending bits 
scheme. A starting suggestion:

* first byte: 2bits of length; followed by, if any of the first bits are set, 1 
sign bit; followed by, if all length bits are set, 2 more bits of length; the 
remainder (3-6 bits) are value bits
* all remaining bytes contain value bits only

This would lead to the following encoding sizes
||value range||suggested scheme size||existing scheme size||
|0..63|1|1|
|64..8K|2|mostly 3, 64..128=1, 128..256=2|
|-8192..0|2|mostly 3, -112..0=1, -256..-112=1|
|8K..2M|3|mostly 4|

etc.

So, we basically lose out a small amount for values in the range 64..128, and 
-256..-1. Everything else we gain. If we wanted to further bias towards 
positive encoding, we could require that at least one sign bit is present for 
the signbit to be present, so that negative numbers cannot be encoded in less 
than 3 bytes.

> Introduce writeVInt method to DataOutputStreamPlus
> --------------------------------------------------
>
>                 Key: CASSANDRA-9499
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Ariel Weisberg
>            Priority: Minor
>             Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing 
> CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It 
> should be possible to make an especially efficient implementation against 
> BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to