In compressSingleKeyValue(), we compute the number of bytes to store
timestamp:
timestampFitsInBytes = ByteBufferUtils.longFitsIn(timestamp);
Then the number of bytes to store the diff between the timestamp of
previous cell and current timestamp is computed:
diffTimestampFitsInBytes = ByteBufferUtils.longFitsIn(diffTimestamp);
After that FLAG_TIMESTAMP_IS_DIFF would be set if it is more efficient to
store the diff:
if (diffTimestampFitsInBytes < timestampFitsInBytes) {
flag |= (diffTimestampFitsInBytes - 1) << SHIFT_TIMESTAMP_LENGTH;
flag |= FLAG_TIMESTAMP_IS_DIFF;
This should achieve good efficiency.
Cheers
On Sun, Sep 18, 2016 at 9:37 AM, Lars George <[email protected]> wrote:
> Hi,
>
> Reading the key encoder code and noticed this in the DiffKeyDeltaEncoder:
>
> if ((flag & FLAG_TIMESTAMP_IS_DIFF) == 0) {
> ByteBufferUtils.putLong(out, timestamp, timestampFitsInBytes);
> } else {
> ByteBufferUtils.putLong(out, diffTimestamp, diffTimestampFitsInBytes);
> }
>
> So if the timestamp is the same as the one from the previous cell, we
> store it again in its full fidelity? This makes no sense as we omit
> otherwise all identical fields. I would assume this is a mistake?
>
> In the decoding we do this:
>
> // handle timestamp
> int timestampFitsInBytes =
> ((flag & MASK_TIMESTAMP_LENGTH) >>> SHIFT_TIMESTAMP_LENGTH) + 1;
> long timestamp = ByteBufferUtils.readLong(source, timestampFitsInBytes);
> if ((flag & FLAG_TIMESTAMP_SIGN) != 0) {
> timestamp = -timestamp;
> }
> if ((flag & FLAG_TIMESTAMP_IS_DIFF) != 0) {
> timestamp = state.timestamp - timestamp;
> }
> buffer.putLong(timestamp);
>
> This could be changed to simply use the state.timestamp if the value
> is the same, no? Right now we would add six bytes for the cell
> timestamp when they are identical.
>
> If they are not, we store delta time, so I guess in practice this does
> not happen too often, that we have the same time, unless they are
> client supplied, say for a row to emulate transactions.
>
> Should I open a ticket?
>
> Cheers,
> Lars
>