HBase Types: Explicit Null Support

Nick Dimiduk Mon, 01 Apr 2013 11:01:35 -0700

Heya,

Thinking about data types and serialization. I think null support is an
important characteristic for the serialized representations, especially
when considering the compound type. However, doing so in directly
incompatible with fixed-width representations for numerics. For instance,
if we want to have a fixed-width signed long stored on 8-bytes, where do
you put null? float and double types can cheat a little by folding negative
and positive NaN's into a single representation (this isn't strictly
correct!), leaving a place to represent null. In the long example case, the
obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This
will allocate an additional encoding which can be used for null. My
experience working with scientific data, however, makes me wince at the
idea.


The variable-width encodings have it a little easier. There's already
enough going on that it's simpler to make room.

Remember, the final goal is to support order-preserving serialization. This
imposes some limitations on our encoding strategies. For instance, it's not
enough to simply encode null, it really needs to be encoded as 0x00 so as
to sort lexicographically earlier than any other value.

What do you think? Any ideas, experiences, etc?

Thanks,
Nick

HBase Types: Explicit Null Support

Reply via email to