I think we're getting ahead of ourselves a bit here. First and foremost, I'm looking for consensus that HBase should ship with tools for serializing Java primitive types such that the byte[] representations maintain sorted order. This is primarily to the benefit of users of HBase in that 3rd party tools can enjoy interoperability in so much as is provided by HBase (ie, I can write a Pig script that writes a long and my Hive queries can read that value). Furthermore, the implementations of these tools benefit from the order-preserving representation.
Assuming this capacity is agreed to be desirable, I propose the adoption of this orphaned community library. I have no particular love for the name of the package, nor am I concerned terribly about which module it resides in. Personally, I think it should ship with (explicitly or as a dependency of) the hbase-client module that will exist in 0.96. This is my preference because I think the client API should be extended to use said serialization format directly -- finally, HBase could "support" types other than byte[]. That would be a much larger change, however, and I am not interested in pressing it for this initial discussion. This introduction does not in any way affect the existing Bytes utility. Server components can continue to use it for marshaling their own primitives. This library is of interest primarily to consumers of the HBase client API. (I'd prefer to see Bytes deprecated from client use entirely!) I do not think this library or it's *optional* builder pattern should be used inside of the RegionServer. See also HBASE-7221 for another user who is asking for this kind of builder pattern. The Builder and Iterator utils are only a convince API, providing sugar on top of the underlying StructRowKey implementation. Users interested in producing or consuming compound objects within a tight loop need not bother with either of them. As for the implementation details and dependency on Hadoop Writables: it is my opinion that so long as its dependencies are compatible with the rest of HBase, it's no big deal. From that perspective, dependence on Hadoop Writable implementations is entirely reasonable for an initial inclusion. If, down the road, we wish to reduce dependencies (a practice I generally support) and in so doing it becomes useful to change this implementation detail, so be it. Say, for example, we want to release an hbase-client jar that has no dependency on any Hadoop types, I say go for it. The patch I have contribute tags all of these classes as "Evolving" interfaces, and nothing is set in stone until a release manager and the community bless a new release. I'm happy to work with whomever is interested toward modernizing implementation details once the initial code is in place. Finally, the multiple patches business is nothing more than a reviewer connivence. I'm generally not excited about reviewing more than about 20 files at a time, on Review Board or otherwise. I assume others share the same opinion. As I offered on the ticket itself, I'm fine with accepting review on Review Board on the single large patch; I assumed github would make it easier, not harder. Thanks for your attention. -n On Fri, Feb 22, 2013 at 4:48 PM, Matt Corgan <[email protected]> wrote: > I agree with Jonathan that ideally this would not depend on hbase or > hadoop. Could we just replace Hadoop's BytesWritable with a new class that > does the same thing? > > I also have a concern about the way it builds the multi-field byte[] by > allocating somewhat expensive Builder objects, etc. It's suitable for > application level code, but most of the innards of hbase regionserver > should be using tighter code for best performance and less garbage. > Perhaps in a future issue we can separate the builder wrappers from their > internal byte converters so that hbase-server can use the lower-level byte > converters without the builder overhead. > > > On Fri, Feb 22, 2013 at 4:33 PM, Jonathan Hsieh <[email protected]> wrote: > > > I think I misspoke slightly but basically agree with Matt's notion that > > this would end up being the place to pickup the orderly jar and that > > ideally it has no hbase-* dependencies. > > > > I actually feel that the hbase-orderly module is a sibling to > hbase-common > > and hbase-client. My initial thought is that this is ideally not depended > > upon by the hbase-client. An app would use hbase-orderly and > hbase-client. > > > > > > A simplified module dependency graph (excluding some details) would be > > (where -> == "depends on") > > > > app -> hbase-client, hbase-orderly > > hbase-client -> hbase-protocol, hbase-common, *-compat > > hbase-common -> none of the hbase-* > > hbase-orderly -> none of the hbase-* > > > > I'm don't quite understand what the multiple patches are for the module > > work (or is this follow on stuff that uses this)? can you explain what > the > > breakdown would be? since it isn't committed yet and should be self > > contained, just do the big import as a single patch? > > > > Thanks for bring this up for discussion Nick. > > > > Jon. > > > > On Fri, Feb 22, 2013 at 3:13 PM, Nick Dimiduk <[email protected]> > wrote: > > > > > On Fri, Feb 22, 2013 at 10:00 AM, Matt Corgan <[email protected]> > > wrote: > > > > > > > To nitpick a little it wouldn't quite be a sibling of hbase-client > > > because > > > > hbase-client depends on hbase-common and hbase-protocol > > > > > > > > > > Actually, quite the contrary. I don't see this as being an external > > module > > > as much as integral to the client's use of HBase (read "client" as > > > "application consuming HBase", not "the HBase RPC client > > implementation"). > > > Further, once HBase provides a suitable serialization format for > > > primitives, why not push them into the client API? IMHO, HBase really > > > should provide basic types for users at the Mutation layer. That, > > however, > > > belongs in an entirely separate ticket. > > > > > > On Fri, Feb 22, 2013 at 9:32 AM, Elliott Clark <[email protected]> > > wrote: > > > > > > > > > Yep the client will be fully separated as soon as rpc changes > > > > > are stabilized. Until then keeping up the move patch was just too > > > > onerous. > > > > > > > > > > > > > > > On Fri, Feb 22, 2013 at 6:31 AM, Jonathan Hsieh <[email protected]> > > > > wrote: > > > > > > > > > > > Nick, > > > > > > > > > > > > I'm +1 for it having its own module, and being a sibling of > > > > hbase-client. > > > > > > I'm assuming the client stuff will happen before we release 0.96 > > > since > > > > > it > > > > > > has been started. > > > > > > > > > > > > Jon. > > > > > > > > > > > > On Fri, Feb 22, 2013 at 6:13 AM, Nick Dimiduk < > [email protected]> > > > > > wrote: > > > > > > > > > > > > > You're absolutely correct: this library introduces client-side > > > > > > conventions > > > > > > > and is not needed from within the HMaster or RegionServer. Is > > > > > > > the consensus that it should reside in it's own module or be a > > > > sibling > > > > > to > > > > > > > the o.a.h.hbase.client source tree? I'm a little confused by > the > > > > > current > > > > > > > state of the modules; hbase-client looks empty while > > > > o.a.h.hbase.client > > > > > > > sits under hbase-server. > > > > > > > > > > > > > > Thanks, > > > > > > > Nick > > > > > > > > > > > > > > On Thu, Feb 21, 2013 at 11:56 PM, Jonathan Hsieh < > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > So I buy the argument about this being included in hbase, but > > > > several > > > > > > of > > > > > > > > the questions still stand -- > > > > > > > > > > > > > > > > Why is this part of hbase-common? shouldn't this be just a > > > > > dependency > > > > > > of > > > > > > > > hbase-client module? Does the hbase-server side need to > depend > > > on > > > > > > this? > > > > > > > > > > > > > > > > Since this is a large import of a currently isolated library, > > why > > > > not > > > > > > > make > > > > > > > > it a separate module instead of part of hbase-common? This > > would > > > > > > > enforce a > > > > > > > > boundary that will prevent pollution from circular > > dependencies. > > > > > > > > > > > > > > > > Jon. > > > > > > > > > > > > > > > > On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > > > I think this belongs in core HBase, as a replacement to > > Bytes, > > > > > which > > > > > > > > should > > > > > > > > > be deprecated eventually. We have a Bytes utility which is > > > > supposed > > > > > > to > > > > > > > > > convert basic java types to byte[]'s, but it does not work > > for > > > > > signed > > > > > > > > > numbers. > > > > > > > > > > > > > > > > > > We already know that all of the clients, Hive, Pig, > Phoenix, > > > have > > > > > to > > > > > > > have > > > > > > > > > at least java type -> byte[] conversion utilities, and I > > think > > > it > > > > > is > > > > > > > > > HBase's job to supply one so that different clients can > > > > > interoperate. > > > > > > > > Since > > > > > > > > > internally we are also relying on serializing java types, > we > > > need > > > > > > that > > > > > > > > > library in the core. > > > > > > > > > > > > > > > > > > BTW, I also think that we need to have a SQL-type to java > > type > > > to > > > > > > > byte[] > > > > > > > > > layer, but that is another discussion. > > > > > > > > > > > > > > > > > > Enis > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh < > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Nick, > > > > > > > > > > > > > > > > > > > > While I believe having an order-preserving canonical > > > > > serialization > > > > > > > is a > > > > > > > > > > good idea, from doing a read of the mail and a skim of > the > > > > jira > > > > > it > > > > > > > is > > > > > > > > > not > > > > > > > > > > clear to my why this is inside hbase as part of > > hbase-common. > > > > > > > > > > > > > > > > > > > > Why isn't this part of a library on top of hbase (a > > > dependency > > > > > for > > > > > > > > > > Pig/Hive) instead of "inside" hbase? > > > > > > > > > > Can't this functionality be done just from the client > > level? > > > > > > > > > > What's the end goal hee? Is the goal here to replace the > > > > > > > > Bytes.toBytes(*) > > > > > > > > > > methods to enforced the ordering? > > > > > > > > > > If I HBase has two mutually incompatible encodings > > > "built-in", > > > > > how > > > > > > > > does a > > > > > > > > > > dev know to use one or the other later on? > > > > > > > > > > If this is essentially a mega import of a library (300k.. > > > > yikes) > > > > > , > > > > > > > why > > > > > > > > > not > > > > > > > > > > make it a separate module instead of part of common? > > > > > > > > > > > > > > > > > > > > Jon. > > > > > > > > > > > > > > > > > > > > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk < > > > > > [email protected] > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > I'm of the opinion that HBase should provide a > mechanism > > > for > > > > > > > > > serializing > > > > > > > > > > > common java types such that the serialized format sorts > > > > > according > > > > > > > the > > > > > > > > > > > the natural ordering of the type. I think many > > application > > > > > > efforts > > > > > > > > end > > > > > > > > > up > > > > > > > > > > > building a custom, partial implementation of this kind > of > > > > > > > > functionality > > > > > > > > > > on > > > > > > > > > > > their own. I think HBase should provide a canonical > > > > > > implementation > > > > > > > of > > > > > > > > > > such > > > > > > > > > > > a serialization format so that third-parties can > reliably > > > > build > > > > > > on > > > > > > > > top > > > > > > > > > of > > > > > > > > > > > HBase. Not just user applications, but other tools like > > Pig > > > > and > > > > > > > Hive > > > > > > > > > are > > > > > > > > > > > also enabled. Implementations for > > > > > > > > > > > HIVE-3634< > > https://issues.apache.org/jira/browse/HIVE-3634 > > > >, > > > > > > > > > > > HIVE-2599 < > > https://issues.apache.org/jira/browse/HIVE-2599 > > > >, > > > > > or > > > > > > > > > > > HIVE-2903< > > https://issues.apache.org/jira/browse/HIVE-2903 > > > > > >could > > > > > > be > > > > > > > > > > > compatible with similar features in Pig. > > > > > > > > > > > > > > > > > > > > > > After implementing something similar on multiple > > occasions, > > > > > > > stumbled > > > > > > > > > > across > > > > > > > > > > > the Orderly <https://github.com/ndimiduk/orderly> > > library. > > > > > It's > > > > > > > also > > > > > > > > > > > appears to have been adopted by other large projects, > > > > including > > > > > > > > > > > Lily<https://github.com/NGDATA/orderly>. > > > > > > > > > > > I've engaged the library's author for some improvements > > > only > > > > to > > > > > > > find > > > > > > > > > out > > > > > > > > > > > he's now at Google and will no longer be maintaining > it. > > > > Thus, > > > > > I > > > > > > > > > propose > > > > > > > > > > we > > > > > > > > > > > take it into HBase. > > > > > > > > > > > > > > > > > > > > > > HBASE-7692 < > > > https://issues.apache.org/jira/browse/HBASE-7692 > > > > > > > > > > > > > > includes a > > > > > > > > > > > patch that introduces Orderly into hbase-common under > the > > > > > orderly > > > > > > > > > > > namespace. I have an associated branch on > > > > > > > > > > > gihub< > > > > > > > > > > > > > > > > > > https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization > > > > > > > > > > > >wherein > > > > > > > > > > > I've broken the patch out into multiple commits to ease > > > > review. > > > > > > > > > > > Please take a few minutes to give it a look. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > Nick > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > // Jonathan Hsieh (shay) > > > > > > > > > > // Software Engineer, Cloudera > > > > > > > > > > // [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > // Jonathan Hsieh (shay) > > > > > > > > // Software Engineer, Cloudera > > > > > > > > // [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > // Jonathan Hsieh (shay) > > > > > > // Software Engineer, Cloudera > > > > > > // [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > // Jonathan Hsieh (shay) > > // Software Engineer, Cloudera > > // [email protected] > > >
