Re: DiffKeyDeltaEncoder not fully efficient?

Ted Yu Sun, 18 Sep 2016 10:10:19 -0700

In compressSingleKeyValue(), we compute the number of bytes to store
timestamp:


      timestampFitsInBytes = ByteBufferUtils.longFitsIn(timestamp);

Then the number of bytes to store the diff between the timestamp of
previous cell and current timestamp is computed:

      diffTimestampFitsInBytes = ByteBufferUtils.longFitsIn(diffTimestamp);

After that FLAG_TIMESTAMP_IS_DIFF would be set if it is more efficient to
store the diff:

      if (diffTimestampFitsInBytes < timestampFitsInBytes) {

        flag |= (diffTimestampFitsInBytes - 1) << SHIFT_TIMESTAMP_LENGTH;

        flag |= FLAG_TIMESTAMP_IS_DIFF;

This should achieve good efficiency.

Cheers

On Sun, Sep 18, 2016 at 9:37 AM, Lars George <[email protected]> wrote:

> Hi,
>
> Reading the key encoder code and noticed this in the DiffKeyDeltaEncoder:
>
> if ((flag & FLAG_TIMESTAMP_IS_DIFF) == 0) {
>   ByteBufferUtils.putLong(out, timestamp, timestampFitsInBytes);
> } else {
>   ByteBufferUtils.putLong(out, diffTimestamp, diffTimestampFitsInBytes);
> }
>
> So if the timestamp is the same as the one from the previous cell, we
> store it again in its full fidelity? This makes no sense as we omit
> otherwise all identical fields. I would assume this is a mistake?
>
> In the decoding we do this:
>
> // handle timestamp
> int timestampFitsInBytes =
>     ((flag & MASK_TIMESTAMP_LENGTH) >>> SHIFT_TIMESTAMP_LENGTH) + 1;
> long timestamp = ByteBufferUtils.readLong(source, timestampFitsInBytes);
> if ((flag & FLAG_TIMESTAMP_SIGN) != 0) {
>   timestamp = -timestamp;
> }
> if ((flag & FLAG_TIMESTAMP_IS_DIFF) != 0) {
>   timestamp = state.timestamp - timestamp;
> }
> buffer.putLong(timestamp);
>
> This could be changed to simply use the state.timestamp if the value
> is the same, no? Right now we would add six bytes for the cell
> timestamp when they are identical.
>
> If they are not, we store delta time, so I guess in practice this does
> not happen too often, that we have the same time, unless they are
> client supplied, say for a row to emulate transactions.
>
> Should I open a ticket?
>
> Cheers,
> Lars
>

Re: DiffKeyDeltaEncoder not fully efficient?

Reply via email to