There is nothing wrong with co-locating index and data on a same RS. This will greatly improve single table search. Joins are evil anyway. Leave them to RDBMS Zoo.
-Vlad On Mon, Mar 16, 2015 at 8:14 AM, Michael Segel <michael_se...@hotmail.com> wrote: > You’ll have to excuse Andy. > > He’s a bit slow. HBASE-13044 should have been done 2 years ago. And it > was trivial. Just got done last month…. > > But I digress… The long story short… > > HBASE-9203 was brain dead from inception. Huawei’s idea was to index on > the region which had two problems. > 1) Complexity in that they wanted to keep the index on the same region > server > 2) Joins become impossible. Well, actually not impossible, but incredibly > slow when compared to the alternative. > > You really should go back to the email chain. > Their defense (including Salesforce who was going to push this approach) > fell apart when you asked the simple question on how do you handle joins? > > That’s their OOPS moment. Once you start to understand that, then allowing > the index to be orthogonal to the base table, things started to come > together. > > In short, you have a query either against a single table, or if you’re > doing a join. You then get the indexes and assuming that you’re only using > the AND predicate, its a simple intersection of the index result sets. > (Since the result sets are ordered, its relatively trivial to walk through > and find the intersections of N Lists in a single pass.) > > > Now you have your result set of base table row keys and you can work with > that data. (Either returning the records to the client, or as input to a > map/reduce job. > > That’s the 30K view. There’s more to it, but once Salesforce got the > basic idea, they ran with it. It was really that simple concept that the > index would be orthogonal to the base table that got them moving in the > right direction. > > > To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However, > it seems that some of the Committers are suffering from rectal induced > hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot > spotting’ but also to get the Committers to focus on bringing the solutions > that they glum on in the client, back to the server side of things. > > Unfortunately the last great attempt at fixing things on the server side > was the bastardization of coprocessors which again, suffers from the lack > of thought. This isn’t to say that allowing users to extend the server > side functionality is wrong. (Because it isn’t.) But that the > implementation done in HBase is a tad lacking in thought. > > So in terms of indexing… > Longer term picture, there has to be some fixes on the server side of > things to allow one to associate an index (allowing for different types) to > a base table, yet the implementation of using the index would end up > becoming a client. And by client, it would be an external query engine > processor that could/should sit on the cluster. > > But hey! What do I know? > I gave up trying to have an intelligent/civilized conversation with Andrew > because he just couldn’t grasp the basics. ;-) > > > > > > > On Mar 13, 2015, at 4:14 PM, Andrew Purtell <apurt...@apache.org> wrote: > > > > When I made that remark I was thinking of a recent discussion we had at a > > joint Phoenix and HBase developer meetup. The difference of opinion was > > certainly civilized. (smile) I'm not aware of any specific written > > discussion, it may or may not exist. I'm pretty sure a revival of > HBASE-9203 > > would attract some controversy, but let me be clearer this time than I > was > > before that this is just my opinion, FWIW. > > > > > > On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph < > > joseph.r...@childrens.harvard.edu> wrote: > > > >> I saw that it was added to their project. I’m really not keen on > bringing > >> in all the RDBMS apparatus on top of hbase, so I decided to follow other > >> avenues first (like trying to patch 0.98, for better or worse.) > >> > >> That Phoenix article seems like a good breakdown of the various indexing > >> architectures. > >> > >> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized > (as > >> are most of them, it seems) so I didn’t know there were these > differences > >> of opinion. Did I miss the mailing list thread where the architectural > >> differences were discussed? > >> > >> > >> -j > > The opinions expressed here are mine, while they may reflect a cognitive > thought, that is purely accidental. > Use at your own risk. > Michael Segel > michael_segel (AT) hotmail.com > > > > > >