IMHO, it would be valuable if the design considered both a global indexing solution and a local indexing solution. Both are useful in different circumstances. The global indexing design plus the application integration points could be derived from Jesse's work with his reference implementation in Phoenix - the global indexing code has no Phoenix dependencies and clearly defined integration points.
Thanks, James On Jan 9, 2014, at 6:36 AM, Jesse Yates <jesse.k.ya...@gmail.com> wrote: > Yes, that was a big concern I had as well. > > It's not clear how that will work with a large number of indexes; if people > have one index, they will want more than one. To not plan for that seems > like an incomplete implementation to me. In a horizontally scalable system > like HBase, lots of buddy region isn't going to work out well..* Once we > have regions that cannot be collocated, the extra RPC time starts to be the > biggest factor (as the doc points out) and we are back to what Phoenix is > already doing**. > > But I'm probably missing something here in what makes it different? > > For folks that haven't been following the issue some high-level "how it all > kinda works" would be helpful from the championing commiters; that's a long > doc to get through and grok :). How similar is this to the work currently > by the existing indexing implementations (huawei, Phoenix, ngdata)? The doc > doesn't really nail down the interactions, but instead just right in after > describing why SI should be added. > > Agree this would be super useful, but don't want to waste too much work > reinventing the wheel or doing the wrong thing. further, this impl quickly > starts to lead down the query optimization path, which get HBase away from > its core "be a great byte store". > > Like I said, I'm all for secondary indexes in HBase and think this is a > great push. I don't mean to rain on any parades. > > - jesse > > * but a smart way to specify region collocation? That I can get behind as > it would unify a couple different indexing impls (e.g Phoenix would > consider using it to help make indexing faster - RPCs do suck). > > ** for instance, the doc talks about how to implement indexing for > floats... That might be a default impl, but for use cases like Phoenix this > would break all our current encodings. We handled this is the indexing impl > by making the builder pluggable for different use cases to support > different encodings. I feel like a lot of the code for this kind of SI > impl is already in Phoenix and has been working and fast for several months > now; it's surprisingly tricky, especially with the delete cases and time > stamp manipulation issues. > > > On Thursday, January 9, 2014, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) > wrote: > >> Could you explain how the 1-1 association between user and index table >> regions is maintained. I wasn't able to understand fully from the document. >> >> ----- Original Message ----- >> From: Ted Yu <dev@hbase.apache.org> >> To: dev@hbase.apache.org >> At: Jan 8, 2014 3:41:40 PM >> >> Hi, >> Secondary index support is a frequently requested feature. >> >> Please find the updated design doc here: >> >> https://issues.apache.org/jira/secure/attachment/12621909/SecondaryIndex%20Design_Updated_2.pdf >> >> HBASE-9203 is the umbrella JIRA. >> >> Implementation patch was attached to HBASE-10222 >> >> Thanks to Rajesh who works on this feature. >> >> Cheers >> > > > -- > ------------------- > Jesse Yates > @jesse_yates > jyates.github.com