Re: HBase Types: Explicit Null Support

Nick Dimiduk Mon, 01 Apr 2013 19:27:21 -0700

Furthermore, is is more important to support null values than squeeze all
representations into minimum size (4-bytes for int32, &c.)?
On Apr 1, 2013 4:41 PM, "Nick Dimiduk" <[email protected]> wrote:


> On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <[email protected]>wrote:
>
>> From the SQL perspective, handling null is important.
>
>
> From your perspective, it is critical to support NULLs, even at the
> expense of fixed-width encodings at all or supporting representation of a
> full range of values. That is, you'd rather be able to represent NULL than
> -2^31?
>
> On 04/01/2013 01:32 PM, Nick Dimiduk wrote:
>>
>>> Thanks for the thoughtful response (and code!).
>>>
>>> I'm thinking I will press forward with a base implementation that does
>>> not
>>> support nulls. The idea is to provide an extensible set of interfaces,
>>> so I
>>> think this will not box us into a corner later. That is, a mirroring
>>> package could be implemented that supports null values and accepts
>>> the relevant trade-offs.
>>>
>>> Thanks,
>>> Nick
>>>
>>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[email protected]>
>>> wrote:
>>>
>>>  I spent some time this weekend extracting bits of our serialization
>>>> code to
>>>> a public github repo at 
>>>> http://github.com/hotpads/**data-tools<http://github.com/hotpads/data-tools>
>>>> .
>>>>   Contributions are welcome - i'm sure we all have this stuff laying
>>>> around.
>>>>
>>>> You can see I've bumped into the NULL problem in a few places:
>>>> *
>>>>
>>>> https://github.com/hotpads/**data-tools/blob/master/src/**
>>>> main/java/com/hotpads/data/**primitive/lists/LongArrayList.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java>
>>>> *
>>>>
>>>> https://github.com/hotpads/**data-tools/blob/master/src/**
>>>> main/java/com/hotpads/data/**types/floats/DoubleByteTool.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java>
>>>>
>>>> Looking back, I think my latest opinion on the topic is to reject
>>>> nullability as the rule since it can cause unexpected behavior and
>>>> confusion.  It's cleaner to provide a wrapper class (so both
>>>> LongArrayList
>>>> plus NullableLongArrayList) that explicitly defines the behavior, and
>>>> costs
>>>> a little more in performance.  If the user can't find a pre-made wrapper
>>>> class, it's not very difficult for each user to provide their own
>>>> interpretation of null and check for it themselves.
>>>>
>>>> If you reject nullability, the question becomes what to do in situations
>>>> where you're implementing existing interfaces that accept nullable
>>>> params.
>>>>   The LongArrayList above implements List<Long> which requires an
>>>> add(Long)
>>>> method.  In the above implementation I chose to swap nulls with
>>>> Long.MIN_VALUE, however I'm now thinking it best to force the user to
>>>> make
>>>> that swap and then throw IllegalArgumentException if they pass null.
>>>>
>>>>
>>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <
>>>> [email protected]
>>>>
>>>>> wrote:
>>>>> HmmmŠ good question.
>>>>>
>>>>> I think that fixed width support is important for a great many rowkey
>>>>> constructs cases, so I'd rather see something like losing MIN_VALUE and
>>>>> keeping fixed width.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <[email protected]> wrote:
>>>>>
>>>>>  Heya,
>>>>>>
>>>>>> Thinking about data types and serialization. I think null support is
>>>>>> an
>>>>>> important characteristic for the serialized representations,
>>>>>> especially
>>>>>> when considering the compound type. However, doing so in directly
>>>>>> incompatible with fixed-width representations for numerics. For
>>>>>>
>>>>> instance,
>>>>
>>>>> if we want to have a fixed-width signed long stored on 8-bytes, where
>>>>>> do
>>>>>> you put null? float and double types can cheat a little by folding
>>>>>> negative
>>>>>> and positive NaN's into a single representation (this isn't strictly
>>>>>> correct!), leaving a place to represent null. In the long example
>>>>>> case,
>>>>>> the
>>>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one.
>>>>>> This
>>>>>> will allocate an additional encoding which can be used for null. My
>>>>>> experience working with scientific data, however, makes me wince at
>>>>>> the
>>>>>> idea.
>>>>>>
>>>>>> The variable-width encodings have it a little easier. There's already
>>>>>> enough going on that it's simpler to make room.
>>>>>>
>>>>>> Remember, the final goal is to support order-preserving serialization.
>>>>>> This
>>>>>> imposes some limitations on our encoding strategies. For instance,
>>>>>> it's
>>>>>> not
>>>>>> enough to simply encode null, it really needs to be encoded as 0x00 so
>>>>>>
>>>>> as
>>>>
>>>>> to sort lexicographically earlier than any other value.
>>>>>>
>>>>>> What do you think? Any ideas, experiences, etc?
>>>>>>
>>>>>> Thanks,
>>>>>> Nick
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>
>

Re: HBase Types: Explicit Null Support

Reply via email to