Thanks, Wilm. I’ll look for the thread there. Obviously I didn’t realize there was so much back story: I was asking about this specific implementation because it seems to be fairly well thought out and have good commentary in the Jira ticket (HBASE-9203). At the time I thought it was mostly a dev concern. I think we’ve moved on, as you pointed out.
I'd be happy to contribute to hbase if I have something to offer. I’m just starting with this, so let’s see where it takes us. For those of you joining us late, you can find the continuation here: http://mail-archives.apache.org/mod_mbox/hbase-user/201503.mbox/%3C550722DA .3040009%40gmail.com%3E -j On 3/16/15, 2:09 PM, "Wilm Schumacher" <wilm.schumac...@gmail.com> wrote: >Hi Joseph, > >I think that you kicked off this discussion, because to implement an >indexing mechanism for hbase in general is much more complicate than >your specific problem. The people on this list want to bear every >possible (or at least A LOT) of applications in mind. A too easy >mechanism wouldn't fit the needs of most of the users (thus would be >useless), a more complicate model is harder to maintain and you would >have to find more coders etc.. Thus with your application question you >seemed to walked right into a very general discussion. > >Furthermore this is a user question, as you do not want to change the >code of hbase, aren't you ;). I'll try an answer on the general user >list in a couple of minutes, thus more people can discuss and we can get >traffic out of this list, okay? > >Best wishes > >Wilm > >Am 16.03.2015 um 18:46 schrieb Rose, Joseph: >> Alright, let’s see if I can get this discussion back on track. >> >> I have a sensibly defined table for patient data; its rowkey is simply >> lastname:firstname, since it’s convenient for the bulk of my lookups. >> Unfortunately I also need to efficiently find patients using an ID >>string, >> whose literal value is buried in a value field. I’m sure this situation >>is >> not foreign to the people on this list. >> >> It’s been suggested that I implement 2’ indexes myself — fine. All the >> research I’ve done seems to end with that suggestion, with the exception >> of Phoenix (I don’t want the RDBMS layer) and Huawei’s stuff (which >>seems >> to incite some discussion here). I’m happy to put this together but I’d >> rather go with something that has been vetted and has a larger developer >> community than one (i.e., ME). Besides, I have a full enough plate at >>the >> moment that I’d rather not have to do this, too. >> >> Are there constructive suggestions regarding how I can proceed with >>HBase? >> Right now even a well-vetted local index would be a godsend. >> >> Thanks. >> >> >> -j >> >> >> p.s., I’ll refer you to this post for a slightly more detailed rundown >>of >> how I plan to do things: >> >>https://urldefense.proofpoint.com/v2/url?u=http-3A__article.gmane.org_gma >>ne.comp.java.hadoop.hbase.user_46467&d=BQIDaQ&c=qS4goWBT7poplM69zy_3xhKwE >>W14JZMSdioCoppxeFU&r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaV >>uQHxlqAccDLc&m=NwQpAjAe0QcCDK7Dp0galpRYD3IvcpoK3xijbLf1WFo&s=lBW_VCH7IruB >>tyg3PhTjU_CW2-po9IFfiIYNMpglIRk&e= >> >> >> On 3/16/15, 12:18 PM, "Michael Segel" <michael_se...@hotmail.com> wrote: >> >>> Joseph, >>> >>> The issue with Andrew goes back a few years. His comment about having >>>a >>> civilized discussion was a personal dig at me. >>> >>> >>>> On Mar 16, 2015, at 10:38 AM, Rose, Joseph >>>> <joseph.r...@childrens.harvard.edu> wrote: >>>> >>>> Michael, >>>> >>>> I don’t understand the invective. I’m sure you have something to >>>> contribute but when bring on this tone the only thing I hear are the >>>> snide >>>> comments. >>>> >>>> >>>> -j >>>> >>>> >>>> P.s., I’ll refer you to this: >>>> >>>>https://urldefense.proofpoint.com/v2/url?u=https-3A__hbase.apache.org_b >>>>oo >>>> >>>>k.html-23-5Fjoins&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeF >>>>U& >>>> >>>>r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc&m=uj >>>>JC >>>> >>>>fI0GwgZ1Qx9be1fW7FIRqFeS-UmWVS304uhfKLs&s=2TGF0r5VvzExMqV31LmI3rQd4B8eJ >>>>q_ >>>> PqYKJXUqAjNk&e= >>>> >>>> >>>> On 3/16/15, 11:15 AM, "Michael Segel" <michael_se...@hotmail.com> >>>>wrote: >>>> >>>>> You’ll have to excuse Andy. >>>>> >>>>> He’s a bit slow. HBASE-13044 should have been done 2 years ago. And >>>>>it >>>>> was trivial. Just got done last month…. >>>>> >>>>> But I digress… The long story short… >>>>> >>>>> HBASE-9203 was brain dead from inception. Huawei’s idea was to index >>>>> on >>>>> the region which had two problems. >>>>> 1) Complexity in that they wanted to keep the index on the same >>>>>region >>>>> server >>>>> 2) Joins become impossible. Well, actually not impossible, but >>>>> incredibly slow when compared to the alternative. >>>>> >>>>> You really should go back to the email chain. >>>>> Their defense (including Salesforce who was going to push this >>>>> approach) >>>>> fell apart when you asked the simple question on how do you handle >>>>> joins? >>>>> >>>>> That’s their OOPS moment. Once you start to understand that, then >>>>> allowing the index to be orthogonal to the base table, things started >>>>> to >>>>> come together. >>>>> >>>>> In short, you have a query either against a single table, or if >>>>>you’re >>>>> doing a join. You then get the indexes and assuming that you’re only >>>>> using the AND predicate, its a simple intersection of the index >>>>>result >>>>> sets. (Since the result sets are ordered, its relatively trivial to >>>>> walk >>>>> through and find the intersections of N Lists in a single pass.) >>>>> >>>>> >>>>> Now you have your result set of base table row keys and you can work >>>>> with >>>>> that data. (Either returning the records to the client, or as input >>>>>to >>>>> a >>>>> map/reduce job. >>>>> >>>>> That’s the 30K view. There’s more to it, but once Salesforce got the >>>>> basic idea, they ran with it. It was really that simple concept that >>>>> the >>>>> index would be orthogonal to the base table that got them moving in >>>>>the >>>>> right direction. >>>>> >>>>> >>>>> To Joseph’s point, indexing isn’t necessarily an RDBMS feature. >>>>> However, >>>>> it seems that some of the Committers are suffering from rectal >>>>>induced >>>>> hypoxia. HBASE-12853 was created not just to help solve the issue of >>>>> ‘hot >>>>> spotting’ but also to get the Committers to focus on bringing the >>>>> solutions that they glum on in the client, back to the server side of >>>>> things. >>>>> >>>>> Unfortunately the last great attempt at fixing things on the server >>>>> side >>>>> was the bastardization of coprocessors which again, suffers from the >>>>> lack >>>>> of thought. This isn’t to say that allowing users to extend the >>>>>server >>>>> side functionality is wrong. (Because it isn’t.) But that the >>>>> implementation done in HBase is a tad lacking in thought. >>>>> >>>>> So in terms of indexing… >>>>> Longer term picture, there has to be some fixes on the server side of >>>>> things to allow one to associate an index (allowing for different >>>>> types) >>>>> to a base table, yet the implementation of using the index would end >>>>>up >>>>> becoming a client. And by client, it would be an external query >>>>>engine >>>>> processor that could/should sit on the cluster. >>>>> >>>>> But hey! What do I know? >>>>> I gave up trying to have an intelligent/civilized conversation with >>>>> Andrew because he just couldn’t grasp the basics. ;-) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <apurt...@apache.org> >>>>>> wrote: >>>>>> >>>>>> When I made that remark I was thinking of a recent discussion we had >>>>>> at >>>>>> a >>>>>> joint Phoenix and HBase developer meetup. The difference of opinion >>>>>> was >>>>>> certainly civilized. (smile) I'm not aware of any specific written >>>>>> discussion, it may or may not exist. I'm pretty sure a revival of >>>>>> HBASE-9203 >>>>>> would attract some controversy, but let me be clearer this time >>>>>>than I >>>>>> was >>>>>> before that this is just my opinion, FWIW. >>>>>> >>>>>> >>>>>> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph < >>>>>> joseph.r...@childrens.harvard.edu> wrote: >>>>>> >>>>>>> I saw that it was added to their project. I’m really not keen on >>>>>>> bringing >>>>>>> in all the RDBMS apparatus on top of hbase, so I decided to follow >>>>>>> other >>>>>>> avenues first (like trying to patch 0.98, for better or worse.) >>>>>>> >>>>>>> That Phoenix article seems like a good breakdown of the various >>>>>>> indexing >>>>>>> architectures. >>>>>>> >>>>>>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty >>>>>>> civilized >>>>>>> (as >>>>>>> are most of them, it seems) so I didn’t know there were these >>>>>>> differences >>>>>>> of opinion. Did I miss the mailing list thread where the >>>>>>> architectural >>>>>>> differences were discussed? >>>>>>> >>>>>>> >>>>>>> -j >>>>> The opinions expressed here are mine, while they may reflect a >>>>> cognitive >>>>> thought, that is purely accidental. >>>>> Use at your own risk. >>>>> Michael Segel >>>>> michael_segel (AT) hotmail.com >>>>> >>>>> >>>>> >>>>> >>>>> >>> The opinions expressed here are mine, while they may reflect a >>>cognitive >>> thought, that is purely accidental. >>> Use at your own risk. >>> Michael Segel >>> michael_segel (AT) hotmail.com >>> >>> >>> >>> >>> >