[ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589470#comment-14589470 ]
Benedict edited comment on CASSANDRA-9499 at 6/17/15 8:47 AM: -------------------------------------------------------------- I'm confused as to why we need 10 bytes? Pretty much by definition a continuation bit encoding needs 9 bytes to represent 8 bytes. Let's not use Google's implementation. It looks pretty rubbish. The reason they use 10 bytes is they cannot be bothered to realise the last byte does not need a continuation bit. They also use a *terrible* implementation for deciding how long it needs to be. Here's a simple change which makes it 9 bytes, and easily optimised: the continuation bits are all shifted to the first byte, which effectively encodes the length in run-length encoding, i.e. the number of contiguous top order bits that are set to 1. i.e. {{length = Integer.numberOfLeadingZeros(firstByte ^ 0xffff)}} This way we read the first byte, and if there are any more to read, we read a long, and quickly truncate. was (Author: benedict): I'm confused as to why we need 10 bytes? Pretty much by definition a continuation bit encoding needs 9 bytes to represent 8 bytes. Let's not use Google's implementation. It looks pretty rubbish. The reason they use 10 bytes is they cannot be bothered to realise the last byte does not need a continuation bit. They also use a *terrible* implementation for deciding how long it needs to be. Here's a simple change which makes it 9 bytes, and easily optimised: the continuation bits are all shifted to the first byte, which effectively encodes the length in run-length encoding, i.e. the number of contiguous top order bits that are set to 1. i.e. {{length = Integer.numberOfLeadingZeros(firstByte ^ (byte) -1)}} This way we read the first byte, and if there are any more to read, we read a long, and quickly truncate. > Introduce writeVInt method to DataOutputStreamPlus > -------------------------------------------------- > > Key: CASSANDRA-9499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9499 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Ariel Weisberg > Priority: Minor > Fix For: 3.0 beta 1 > > > CASSANDRA-8099 really could do with a writeVInt method, for both fixing > CASSANDRA-9498 but also efficiently encoding timestamp/deletion deltas. It > should be possible to make an especially efficient implementation against > BufferedDataOutputStreamPlus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)