On Fri, Feb 22, 2013 at 5:40 PM, Nick Dimiduk <ndimi...@gmail.com> wrote:
> I think we're getting ahead of ourselves a bit here. First and foremost, > I'm looking for consensus that HBase should ship with tools for serializing > Java primitive types such that the byte[] representations maintain sorted > order. This is primarily to the benefit of users of HBase in that 3rd party > tools can enjoy interoperability in so much as is provided by HBase (ie, I > can write a Pig script that writes a long and my Hive queries can read that > value). Furthermore, the implementations of these tools benefit from the > order-preserving representation. > > Assuming this capacity is agreed to be desirable, I propose the adoption of > this orphaned community library. I have no particular love for the name of > the package, nor am I concerned terribly about which module it resides in. > Personally, I think it should ship with (explicitly or as a dependency of) > the hbase-client module that will exist in 0.96. This is my preference > because I think the client API should be extended to use said serialization > format directly -- finally, HBase could "support" types other than byte[]. > That would be a much larger change, however, and I am not interested in > pressing it for this initial discussion. > > This introduction does not in any way affect the existing Bytes utility. > Server components can continue to use it for marshaling their own > primitives. This library is of interest primarily to consumers of the HBase > client API. (I'd prefer to see Bytes deprecated from client use entirely!) > I do not think this library or it's *optional* builder pattern should be > used inside of the RegionServer. See also HBASE-7221 for another user who > is asking for this kind of builder pattern. The Builder and Iterator utils > are only a convince API, providing sugar on top of the underlying > StructRowKey implementation. Users interested in producing or consuming > compound objects within a tight loop need not bother with either of them. > > As for the implementation details and dependency on Hadoop Writables: it is > my opinion that so long as its dependencies are compatible with the rest of > HBase, it's no big deal. From that perspective, dependence on Hadoop > Writable implementations is entirely reasonable for an initial inclusion. > If, down the road, we wish to reduce dependencies (a practice I generally > support) and in so doing it becomes useful to change this implementation > detail, so be it. Say, for example, we want to release an hbase-client jar > that has no dependency on any Hadoop types, I say go for it. The patch I > have contribute tags all of these classes as "Evolving" interfaces, and > nothing is set in stone until a release manager and the community bless a > new release. I'm happy to work with whomever is interested toward > modernizing implementation details once the initial code is in place. > > Finally, the multiple patches business is nothing more than a > reviewer connivence. I'm generally not excited about reviewing more than > about 20 files at a time, on Review Board or otherwise. I assume others > share the same opinion. As I offered on the ticket itself, I'm fine with > accepting review on Review Board on the single large patch; I assumed > github would make it easier, not harder. > > Thanks for your attention. Thanks for the nice write up Mr. Nick. St.Ack