Re: Review request for HBASE-7692: Ordered byte[] serialization

Stack Tue, 26 Feb 2013 15:18:22 -0800

On Fri, Feb 22, 2013 at 5:40 PM, Nick Dimiduk <ndimi...@gmail.com> wrote:


> I think we're getting ahead of ourselves a bit here. First and foremost,
> I'm looking for consensus that HBase should ship with tools for serializing
> Java primitive types such that the byte[] representations maintain sorted
> order. This is primarily to the benefit of users of HBase in that 3rd party
> tools can enjoy interoperability in so much as is provided by HBase (ie, I
> can write a Pig script that writes a long and my Hive queries can read that
> value). Furthermore, the implementations of these tools benefit from the
> order-preserving representation.
>
> Assuming this capacity is agreed to be desirable, I propose the adoption of
> this orphaned community library. I have no particular love for the name of
> the package, nor am I concerned terribly about which module it resides in.
> Personally, I think it should ship with (explicitly or as a dependency of)
> the hbase-client module that will exist in 0.96. This is my preference
> because I think the client API should be extended to use said serialization
> format directly -- finally, HBase could "support" types other than byte[].
> That would be a much larger change, however, and I am not interested in
> pressing it for this initial discussion.
>
> This introduction does not in any way affect the existing Bytes utility.
> Server components can continue to use it for marshaling their own
> primitives. This library is of interest primarily to consumers of the HBase
> client API. (I'd prefer to see Bytes deprecated from client use entirely!)
> I do not think this library or it's *optional* builder pattern should be
> used inside of the RegionServer. See also HBASE-7221 for another user who
> is asking for this kind of builder pattern. The Builder and Iterator utils
> are only a convince API, providing sugar on top of the underlying
> StructRowKey implementation. Users interested in producing or consuming
> compound objects within a tight loop need not bother with either of them.
>
> As for the implementation details and dependency on Hadoop Writables: it is
> my opinion that so long as its dependencies are compatible with the rest of
> HBase, it's no big deal. From that perspective, dependence on Hadoop
> Writable implementations is entirely reasonable for an initial inclusion.
> If, down the road, we wish to reduce dependencies (a practice I generally
> support) and in so doing it becomes useful to change this implementation
> detail, so be it. Say, for example, we want to release an hbase-client jar
> that has no dependency on any Hadoop types, I say go for it. The patch I
> have contribute tags all of these classes as "Evolving" interfaces, and
> nothing is set in stone until a release manager and the community bless a
> new release. I'm happy to work with whomever is interested toward
> modernizing implementation details once the initial code is in place.
>
> Finally, the multiple patches business is nothing more than a
> reviewer connivence. I'm generally not excited about reviewing more than
> about 20 files at a time, on Review Board or otherwise. I assume others
> share the same opinion. As I offered on the ticket itself, I'm fine with
> accepting review on Review Board on the single large patch; I assumed
> github would make it easier, not harder.
>
> Thanks for your attention.


Thanks for the nice write up Mr. Nick.
St.Ack

Re: Review request for HBASE-7692: Ordered byte[] serialization

Reply via email to