You are right Ted, I misread that. Thanks for the second set of eyes! Disregard this email :)
On Sun, Sep 18, 2016 at 7:09 PM, Ted Yu <[email protected]> wrote: > In compressSingleKeyValue(), we compute the number of bytes to store > timestamp: > > timestampFitsInBytes = ByteBufferUtils.longFitsIn(timestamp); > > Then the number of bytes to store the diff between the timestamp of > previous cell and current timestamp is computed: > > diffTimestampFitsInBytes = ByteBufferUtils.longFitsIn(diffTimestamp); > > After that FLAG_TIMESTAMP_IS_DIFF would be set if it is more efficient to > store the diff: > > if (diffTimestampFitsInBytes < timestampFitsInBytes) { > > flag |= (diffTimestampFitsInBytes - 1) << SHIFT_TIMESTAMP_LENGTH; > > flag |= FLAG_TIMESTAMP_IS_DIFF; > > This should achieve good efficiency. > > Cheers > > On Sun, Sep 18, 2016 at 9:37 AM, Lars George <[email protected]> wrote: > >> Hi, >> >> Reading the key encoder code and noticed this in the DiffKeyDeltaEncoder: >> >> if ((flag & FLAG_TIMESTAMP_IS_DIFF) == 0) { >> ByteBufferUtils.putLong(out, timestamp, timestampFitsInBytes); >> } else { >> ByteBufferUtils.putLong(out, diffTimestamp, diffTimestampFitsInBytes); >> } >> >> So if the timestamp is the same as the one from the previous cell, we >> store it again in its full fidelity? This makes no sense as we omit >> otherwise all identical fields. I would assume this is a mistake? >> >> In the decoding we do this: >> >> // handle timestamp >> int timestampFitsInBytes = >> ((flag & MASK_TIMESTAMP_LENGTH) >>> SHIFT_TIMESTAMP_LENGTH) + 1; >> long timestamp = ByteBufferUtils.readLong(source, timestampFitsInBytes); >> if ((flag & FLAG_TIMESTAMP_SIGN) != 0) { >> timestamp = -timestamp; >> } >> if ((flag & FLAG_TIMESTAMP_IS_DIFF) != 0) { >> timestamp = state.timestamp - timestamp; >> } >> buffer.putLong(timestamp); >> >> This could be changed to simply use the state.timestamp if the value >> is the same, no? Right now we would add six bytes for the cell >> timestamp when they are identical. >> >> If they are not, we store delta time, so I guess in practice this does >> not happen too often, that we have the same time, unless they are >> client supplied, say for a row to emulate transactions. >> >> Should I open a ticket? >> >> Cheers, >> Lars >>
