Mike, Yes, you're mistaken: - secondary indexes in Phoenix are orthogonal to the base table. They're in a separate table ( http://phoenix.incubator.apache.org/secondary_indexing.html). - Phoenix has joins. They're in our master branch with a release scheduled for next month - numeric strings? Not a use case for indexing numeric data? Have you ever seen a number used as an ID? Thanks, James
On Mon, Jan 20, 2014 at 8:50 AM, Michael Segel <michael_se...@hotmail.com>wrote: > Indexes tend to be orthogonal to the base table, not to mention if you’re > using an inverted table for an index, your index table would be much > thinner than your base table. > > Having said that, the solution proposed by Yu, Taylor and others only > works if you want to use the index to help on server side filtering and > misses the boat on the larger and broader picture of improving query > optimization and joins. > > HINT: Unless I am mistaken… until you treat the index as orthogonal to the > base table, you will always lag performance of traditional MPP DWs like > Informix XPS. (Now part of IBM’s IM pillar ) > > In addition, until you fix coprocessors in general, you will have > scalability and performance issues. > (Note that you can write a coprocessor to create a sandbox and separate > the co-process from the RS jvm, however it would be better if it were part > of the underlying coprocessor code. ) > > The current implementation makes joins worthless. > (Note that in prior discussions, Phoenix doesn’t do joins…) > Here’s why: > In order to do a join, if you use the proposed index, you have to first > reduce each index in to a single, sort ordered set. Then you can take the > intersection of the index result sets. The final set would be in sort > order and a subset of the total rows. You can then fetch the rows and still > do a server side filter before returning the ultimate result set. > > Its that first step of reducing each result set in to a single sort > ordered set that takes a lot of effort. > > > On a side note…. there’s been some mention of ordering floats. Again, just > a word of caution… there isn’t a really strong use case for indexing > numeric data types. period. And to be very, very clear, there is a > distinction between numeric strings and numeric data types. > > -Mike > > PS. Because of my role as a consultant, I am very, very limited in what I > can say and contribute. I don’t own my work product, my clients do. Take > what I say with a grain of salt. I’m just a skinny little boy from > Cleveland Ohio, come to chase your beers and drink your women… ;-) > > On Jan 9, 2014, at 10:48 AM, James Taylor <jtay...@salesforce.com> wrote: > > > IMHO, it would be valuable if the design considered both a global > > indexing solution and a local indexing solution. Both are useful in > > different circumstances. The global indexing design plus the > > application integration points could be derived from Jesse's work with > > his reference implementation in Phoenix - the global indexing code has > > no Phoenix dependencies and clearly defined integration points. > > > > Thanks, > > James > > > > On Jan 9, 2014, at 6:36 AM, Jesse Yates <jesse.k.ya...@gmail.com> wrote: > > > >> Yes, that was a big concern I had as well. > >> > >> It's not clear how that will work with a large number of indexes; if > people > >> have one index, they will want more than one. To not plan for that seems > >> like an incomplete implementation to me. In a horizontally scalable > system > >> like HBase, lots of buddy region isn't going to work out well..* Once we > >> have regions that cannot be collocated, the extra RPC time starts to be > the > >> biggest factor (as the doc points out) and we are back to what Phoenix > is > >> already doing**. > >> > >> But I'm probably missing something here in what makes it different? > >> > >> For folks that haven't been following the issue some high-level "how it > all > >> kinda works" would be helpful from the championing commiters; that's a > long > >> doc to get through and grok :). How similar is this to the work > currently > >> by the existing indexing implementations (huawei, Phoenix, ngdata)? The > doc > >> doesn't really nail down the interactions, but instead just right in > after > >> describing why SI should be added. > >> > >> Agree this would be super useful, but don't want to waste too much work > >> reinventing the wheel or doing the wrong thing. further, this impl > quickly > >> starts to lead down the query optimization path, which get HBase away > from > >> its core "be a great byte store". > >> > >> Like I said, I'm all for secondary indexes in HBase and think this is a > >> great push. I don't mean to rain on any parades. > >> > >> - jesse > >> > >> * but a smart way to specify region collocation? That I can get behind > as > >> it would unify a couple different indexing impls (e.g Phoenix would > >> consider using it to help make indexing faster - RPCs do suck). > >> > >> ** for instance, the doc talks about how to implement indexing for > >> floats... That might be a default impl, but for use cases like Phoenix > this > >> would break all our current encodings. We handled this is the indexing > impl > >> by making the builder pluggable for different use cases to support > >> different encodings. I feel like a lot of the code for this kind of SI > >> impl is already in Phoenix and has been working and fast for several > months > >> now; it's surprisingly tricky, especially with the delete cases and time > >> stamp manipulation issues. > >> > >> > >> On Thursday, January 9, 2014, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) > >> wrote: > >> > >>> Could you explain how the 1-1 association between user and index table > >>> regions is maintained. I wasn't able to understand fully from the > document. > >>> > >>> ----- Original Message ----- > >>> From: Ted Yu <dev@hbase.apache.org> > >>> To: dev@hbase.apache.org > >>> At: Jan 8, 2014 3:41:40 PM > >>> > >>> Hi, > >>> Secondary index support is a frequently requested feature. > >>> > >>> Please find the updated design doc here: > >>> > >>> > https://issues.apache.org/jira/secure/attachment/12621909/SecondaryIndex%20Design_Updated_2.pdf > >>> > >>> HBASE-9203 is the umbrella JIRA. > >>> > >>> Implementation patch was attached to HBASE-10222 > >>> > >>> Thanks to Rajesh who works on this feature. > >>> > >>> Cheers > >>> > >> > >> > >> -- > >> ------------------- > >> Jesse Yates > >> @jesse_yates > >> jyates.github.com > > > >