Re: HBase Types: Explicit Null Support

Matt Corgan Mon, 01 Apr 2013 23:17:49 -0700

Ah, I didn't even realize sql allowed null key parts.  Maybe a goal of the
interfaces should be to provide first-class support for custom user types
in addition to the standard ones included.  Part of the power of hbase's
plain byte[] keys is that users can concoct the perfect key for their data
type.  For example, I have a lot of geographic data where I interleave
latitude/longitude bits into a sortable 64 bit value that would probably
never be included in a standard library.



On Mon, Apr 1, 2013 at 8:38 PM, Enis Söztutar <[email protected]> wrote:

> I think having Int32, and NullableInt32 would support minimum overhead, as
> well as allowing SQL semantics.
>
>
> On Mon, Apr 1, 2013 at 7:26 PM, Nick Dimiduk <[email protected]> wrote:
>
> > Furthermore, is is more important to support null values than squeeze all
> > representations into minimum size (4-bytes for int32, &c.)?
> > On Apr 1, 2013 4:41 PM, "Nick Dimiduk" <[email protected]> wrote:
> >
> > > On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <[email protected]
> > >wrote:
> > >
> > >> From the SQL perspective, handling null is important.
> > >
> > >
> > > From your perspective, it is critical to support NULLs, even at the
> > > expense of fixed-width encodings at all or supporting representation
> of a
> > > full range of values. That is, you'd rather be able to represent NULL
> > than
> > > -2^31?
> > >
> > > On 04/01/2013 01:32 PM, Nick Dimiduk wrote:
> > >>
> > >>> Thanks for the thoughtful response (and code!).
> > >>>
> > >>> I'm thinking I will press forward with a base implementation that
> does
> > >>> not
> > >>> support nulls. The idea is to provide an extensible set of
> interfaces,
> > >>> so I
> > >>> think this will not box us into a corner later. That is, a mirroring
> > >>> package could be implemented that supports null values and accepts
> > >>> the relevant trade-offs.
> > >>>
> > >>> Thanks,
> > >>> Nick
> > >>>
> > >>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[email protected]>
> > >>> wrote:
> > >>>
> > >>>  I spent some time this weekend extracting bits of our serialization
> > >>>> code to
> > >>>> a public github repo at http://github.com/hotpads/**data-tools<
> > http://github.com/hotpads/data-tools>
> > >>>> .
> > >>>>   Contributions are welcome - i'm sure we all have this stuff laying
> > >>>> around.
> > >>>>
> > >>>> You can see I've bumped into the NULL problem in a few places:
> > >>>> *
> > >>>>
> > >>>> https://github.com/hotpads/**data-tools/blob/master/src/**
> > >>>> main/java/com/hotpads/data/**primitive/lists/LongArrayList.**java<
> >
> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
> > >
> > >>>> *
> > >>>>
> > >>>> https://github.com/hotpads/**data-tools/blob/master/src/**
> > >>>> main/java/com/hotpads/data/**types/floats/DoubleByteTool.**java<
> >
> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java
> > >
> > >>>>
> > >>>> Looking back, I think my latest opinion on the topic is to reject
> > >>>> nullability as the rule since it can cause unexpected behavior and
> > >>>> confusion.  It's cleaner to provide a wrapper class (so both
> > >>>> LongArrayList
> > >>>> plus NullableLongArrayList) that explicitly defines the behavior,
> and
> > >>>> costs
> > >>>> a little more in performance.  If the user can't find a pre-made
> > wrapper
> > >>>> class, it's not very difficult for each user to provide their own
> > >>>> interpretation of null and check for it themselves.
> > >>>>
> > >>>> If you reject nullability, the question becomes what to do in
> > situations
> > >>>> where you're implementing existing interfaces that accept nullable
> > >>>> params.
> > >>>>   The LongArrayList above implements List<Long> which requires an
> > >>>> add(Long)
> > >>>> method.  In the above implementation I chose to swap nulls with
> > >>>> Long.MIN_VALUE, however I'm now thinking it best to force the user
> to
> > >>>> make
> > >>>> that swap and then throw IllegalArgumentException if they pass null.
> > >>>>
> > >>>>
> > >>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <
> > >>>> [email protected]
> > >>>>
> > >>>>> wrote:
> > >>>>> HmmmŠ good question.
> > >>>>>
> > >>>>> I think that fixed width support is important for a great many
> rowkey
> > >>>>> constructs cases, so I'd rather see something like losing MIN_VALUE
> > and
> > >>>>> keeping fixed width.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <[email protected]> wrote:
> > >>>>>
> > >>>>>  Heya,
> > >>>>>>
> > >>>>>> Thinking about data types and serialization. I think null support
> is
> > >>>>>> an
> > >>>>>> important characteristic for the serialized representations,
> > >>>>>> especially
> > >>>>>> when considering the compound type. However, doing so in directly
> > >>>>>> incompatible with fixed-width representations for numerics. For
> > >>>>>>
> > >>>>> instance,
> > >>>>
> > >>>>> if we want to have a fixed-width signed long stored on 8-bytes,
> where
> > >>>>>> do
> > >>>>>> you put null? float and double types can cheat a little by folding
> > >>>>>> negative
> > >>>>>> and positive NaN's into a single representation (this isn't
> strictly
> > >>>>>> correct!), leaving a place to represent null. In the long example
> > >>>>>> case,
> > >>>>>> the
> > >>>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by
> one.
> > >>>>>> This
> > >>>>>> will allocate an additional encoding which can be used for null.
> My
> > >>>>>> experience working with scientific data, however, makes me wince
> at
> > >>>>>> the
> > >>>>>> idea.
> > >>>>>>
> > >>>>>> The variable-width encodings have it a little easier. There's
> > already
> > >>>>>> enough going on that it's simpler to make room.
> > >>>>>>
> > >>>>>> Remember, the final goal is to support order-preserving
> > serialization.
> > >>>>>> This
> > >>>>>> imposes some limitations on our encoding strategies. For instance,
> > >>>>>> it's
> > >>>>>> not
> > >>>>>> enough to simply encode null, it really needs to be encoded as
> 0x00
> > so
> > >>>>>>
> > >>>>> as
> > >>>>
> > >>>>> to sort lexicographically earlier than any other value.
> > >>>>>>
> > >>>>>> What do you think? Any ideas, experiences, etc?
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Nick
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>
> > >
> >
>

Re: HBase Types: Explicit Null Support

Reply via email to