I generally don't allow nulls in my composite row keys. Does SQL allow nulls in the PK? In the rare case I wanted to do that I might create a separate format called NullableCInt32 with 5 bytes where the first one determined null. It's important to keep the pure types pure.
I have lots of null *values* however, but they're represented by lack of a qualifier in the Put. If a row has all null values, I create a dummy qualifier with a dummy value to make sure the row key gets inserted as it would in sql. On Mon, Apr 1, 2013 at 4:49 PM, James Taylor <[email protected]> wrote: > On 04/01/2013 04:41 PM, Nick Dimiduk wrote: > >> On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <[email protected]> >> wrote: >> >> From the SQL perspective, handling null is important. >>> >> >> From your perspective, it is critical to support NULLs, even at the >> expense >> of fixed-width encodings at all or supporting representation of a full >> range of values. That is, you'd rather be able to represent NULL than >> -2^31? >> > We've been able to get away with supporting NULL through the absence of > the value rather than restricting the data range. We haven't had any push > back on not allowing a fixed width nullable leading row key column. Since > our variable length DECIMAL supports null and is a superset of the fixed > width numeric types, users have a reasonable alternative. > > I'd rather not restrict the range of values, since it doesn't seem like > this would be necessary. > > >> On 04/01/2013 01:32 PM, Nick Dimiduk wrote: >> >>> Thanks for the thoughtful response (and code!). >>>> >>>> I'm thinking I will press forward with a base implementation that does >>>> not >>>> support nulls. The idea is to provide an extensible set of interfaces, >>>> so >>>> I >>>> think this will not box us into a corner later. That is, a mirroring >>>> package could be implemented that supports null values and accepts >>>> the relevant trade-offs. >>>> >>>> Thanks, >>>> Nick >>>> >>>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[email protected]> >>>> wrote: >>>> >>>> I spent some time this weekend extracting bits of our serialization >>>> code >>>> >>>>> to >>>>> a public github repo at >>>>> http://github.com/hotpads/****data-tools<http://github.com/hotpads/**data-tools> >>>>> <http://github.com/**hotpads/data-tools<http://github.com/hotpads/data-tools> >>>>> > >>>>> . >>>>> Contributions are welcome - i'm sure we all have this stuff laying >>>>> around. >>>>> >>>>> You can see I've bumped into the NULL problem in a few places: >>>>> * >>>>> >>>>> https://github.com/hotpads/****data-tools/blob/master/src/**<https://github.com/hotpads/**data-tools/blob/master/src/**> >>>>> main/java/com/hotpads/data/****primitive/lists/LongArrayList.****java< >>>>> https://github.com/**hotpads/data-tools/blob/** >>>>> master/src/main/java/com/**hotpads/data/primitive/lists/** >>>>> LongArrayList.java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java> >>>>> > >>>>> * >>>>> >>>>> https://github.com/hotpads/****data-tools/blob/master/src/**<https://github.com/hotpads/**data-tools/blob/master/src/**> >>>>> main/java/com/hotpads/data/****types/floats/DoubleByteTool.****java< >>>>> https://github.com/**hotpads/data-tools/blob/** >>>>> master/src/main/java/com/**hotpads/data/types/floats/** >>>>> DoubleByteTool.java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java> >>>>> > >>>>> >>>>> Looking back, I think my latest opinion on the topic is to reject >>>>> nullability as the rule since it can cause unexpected behavior and >>>>> confusion. It's cleaner to provide a wrapper class (so both >>>>> LongArrayList >>>>> plus NullableLongArrayList) that explicitly defines the behavior, and >>>>> costs >>>>> a little more in performance. If the user can't find a pre-made >>>>> wrapper >>>>> class, it's not very difficult for each user to provide their own >>>>> interpretation of null and check for it themselves. >>>>> >>>>> If you reject nullability, the question becomes what to do in >>>>> situations >>>>> where you're implementing existing interfaces that accept nullable >>>>> params. >>>>> The LongArrayList above implements List<Long> which requires an >>>>> add(Long) >>>>> method. In the above implementation I chose to swap nulls with >>>>> Long.MIN_VALUE, however I'm now thinking it best to force the user to >>>>> make >>>>> that swap and then throw IllegalArgumentException if they pass null. >>>>> >>>>> >>>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil < >>>>> [email protected] >>>>> >>>>> wrote: >>>>>> HmmmŠ good question. >>>>>> >>>>>> I think that fixed width support is important for a great many rowkey >>>>>> constructs cases, so I'd rather see something like losing MIN_VALUE >>>>>> and >>>>>> keeping fixed width. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <[email protected]> wrote: >>>>>> >>>>>> Heya, >>>>>> >>>>>>> Thinking about data types and serialization. I think null support is >>>>>>> an >>>>>>> important characteristic for the serialized representations, >>>>>>> especially >>>>>>> when considering the compound type. However, doing so in directly >>>>>>> incompatible with fixed-width representations for numerics. For >>>>>>> >>>>>>> instance, >>>>>> if we want to have a fixed-width signed long stored on 8-bytes, where >>>>>> do >>>>>> >>>>>>> you put null? float and double types can cheat a little by folding >>>>>>> negative >>>>>>> and positive NaN's into a single representation (this isn't strictly >>>>>>> correct!), leaving a place to represent null. In the long example >>>>>>> case, >>>>>>> the >>>>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. >>>>>>> This >>>>>>> will allocate an additional encoding which can be used for null. My >>>>>>> experience working with scientific data, however, makes me wince at >>>>>>> the >>>>>>> idea. >>>>>>> >>>>>>> The variable-width encodings have it a little easier. There's already >>>>>>> enough going on that it's simpler to make room. >>>>>>> >>>>>>> Remember, the final goal is to support order-preserving >>>>>>> serialization. >>>>>>> This >>>>>>> imposes some limitations on our encoding strategies. For instance, >>>>>>> it's >>>>>>> not >>>>>>> enough to simply encode null, it really needs to be encoded as 0x00 >>>>>>> so >>>>>>> >>>>>>> as >>>>>> to sort lexicographically earlier than any other value. >>>>>> >>>>>>> What do you think? Any ideas, experiences, etc? >>>>>>> >>>>>>> Thanks, >>>>>>> Nick >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >
