Thanks for the thoughtful response (and code!). I'm thinking I will press forward with a base implementation that does not support nulls. The idea is to provide an extensible set of interfaces, so I think this will not box us into a corner later. That is, a mirroring package could be implemented that supports null values and accepts the relevant trade-offs.
Thanks, Nick On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <mcor...@hotpads.com> wrote: > I spent some time this weekend extracting bits of our serialization code to > a public github repo at http://github.com/hotpads/data-tools. > Contributions are welcome - i'm sure we all have this stuff laying around. > > You can see I've bumped into the NULL problem in a few places: > * > > https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java > * > > https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java > > Looking back, I think my latest opinion on the topic is to reject > nullability as the rule since it can cause unexpected behavior and > confusion. It's cleaner to provide a wrapper class (so both LongArrayList > plus NullableLongArrayList) that explicitly defines the behavior, and costs > a little more in performance. If the user can't find a pre-made wrapper > class, it's not very difficult for each user to provide their own > interpretation of null and check for it themselves. > > If you reject nullability, the question becomes what to do in situations > where you're implementing existing interfaces that accept nullable params. > The LongArrayList above implements List<Long> which requires an add(Long) > method. In the above implementation I chose to swap nulls with > Long.MIN_VALUE, however I'm now thinking it best to force the user to make > that swap and then throw IllegalArgumentException if they pass null. > > > On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <doug.m...@explorysmedical.com > >wrote: > > > > > HmmmŠ good question. > > > > I think that fixed width support is important for a great many rowkey > > constructs cases, so I'd rather see something like losing MIN_VALUE and > > keeping fixed width. > > > > > > > > > > On 4/1/13 2:00 PM, "Nick Dimiduk" <ndimi...@gmail.com> wrote: > > > > >Heya, > > > > > >Thinking about data types and serialization. I think null support is an > > >important characteristic for the serialized representations, especially > > >when considering the compound type. However, doing so in directly > > >incompatible with fixed-width representations for numerics. For > instance, > > >if we want to have a fixed-width signed long stored on 8-bytes, where do > > >you put null? float and double types can cheat a little by folding > > >negative > > >and positive NaN's into a single representation (this isn't strictly > > >correct!), leaving a place to represent null. In the long example case, > > >the > > >obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This > > >will allocate an additional encoding which can be used for null. My > > >experience working with scientific data, however, makes me wince at the > > >idea. > > > > > >The variable-width encodings have it a little easier. There's already > > >enough going on that it's simpler to make room. > > > > > >Remember, the final goal is to support order-preserving serialization. > > >This > > >imposes some limitations on our encoding strategies. For instance, it's > > >not > > >enough to simply encode null, it really needs to be encoded as 0x00 so > as > > >to sort lexicographically earlier than any other value. > > > > > >What do you think? Any ideas, experiences, etc? > > > > > >Thanks, > > >Nick > > > > > > > > >