Re: Design review: Secondary index support through coprocess

James Taylor Mon, 20 Jan 2014 09:34:36 -0800

Mike,
Yes, you're mistaken:
- secondary indexes in Phoenix are orthogonal to the base table. They're in
a separate table (
http://phoenix.incubator.apache.org/secondary_indexing.html).
- Phoenix has joins. They're in our master branch with a release scheduled
for next month
- numeric strings? Not a use case for indexing numeric data? Have you ever
seen a number used as an ID?
Thanks,
James



On Mon, Jan 20, 2014 at 8:50 AM, Michael Segel <michael_se...@hotmail.com>wrote:

> Indexes tend to be orthogonal to the base table, not to mention if you’re
> using an inverted table for an index, your index table would be much
> thinner than your base table.
>
> Having said that, the solution proposed by Yu, Taylor and others only
> works if you want to use the index to help on server side filtering and
> misses the boat on the larger and broader picture of improving query
> optimization and joins.
>
> HINT: Unless I am mistaken… until you treat the index as orthogonal to the
> base table, you will always lag performance of traditional MPP DWs like
> Informix XPS. (Now part of IBM’s IM pillar )
>
> In addition, until you fix coprocessors in general, you will have
> scalability and performance issues.
> (Note that you can write a coprocessor to create a sandbox and separate
> the co-process from the RS jvm, however it would be better if it were part
> of the underlying coprocessor code. )
>
> The current implementation makes joins worthless.
> (Note that in prior discussions,  Phoenix doesn’t do joins…)
> Here’s why:
> In order to do a join, if you use the proposed index, you have to first
> reduce each index in to a single, sort ordered set.  Then you can take the
> intersection of the index result sets.  The final set would be in sort
> order and a subset of the total rows. You can then fetch the rows and still
> do a server side filter before returning the ultimate result set.
>
> Its that first step of reducing each result set in to a single sort
> ordered set that takes a lot of effort.
>
>
> On a side note…. there’s been some mention of ordering floats. Again, just
> a word of caution… there isn’t a really strong use case for indexing
> numeric data types. period.  And to be very, very clear, there is a
> distinction between numeric strings and numeric data types.
>
> -Mike
>
> PS. Because of my role as a consultant, I am very, very limited in what I
> can say and contribute. I don’t own my work product, my clients do. Take
> what I say with a grain of salt.  I’m just a skinny little boy from
> Cleveland Ohio, come to chase your beers and drink your women… ;-)
>
> On Jan 9, 2014, at 10:48 AM, James Taylor <jtay...@salesforce.com> wrote:
>
> > IMHO, it would be valuable if the design considered both a global
> > indexing solution and a local indexing solution. Both are useful in
> > different circumstances. The global indexing design plus the
> > application integration points could be derived from Jesse's work with
> > his reference implementation in Phoenix - the global indexing code has
> > no Phoenix dependencies and clearly defined integration points.
> >
> > Thanks,
> > James
> >
> > On Jan 9, 2014, at 6:36 AM, Jesse Yates <jesse.k.ya...@gmail.com> wrote:
> >
> >> Yes, that was a big concern I had as well.
> >>
> >> It's not clear how that will work with a large number of indexes; if
> people
> >> have one index, they will want more than one. To not plan for that seems
> >> like an incomplete implementation to me. In a horizontally scalable
> system
> >> like HBase, lots of buddy region isn't going to work out well..* Once we
> >> have regions that cannot be collocated, the extra RPC time starts to be
> the
> >> biggest factor (as the doc points out) and we are back to what Phoenix
> is
> >> already doing**.
> >>
> >> But I'm probably missing something here in what makes it different?
> >>
> >> For folks that haven't been following the issue some high-level "how it
> all
> >> kinda works" would be helpful from the championing commiters; that's a
> long
> >> doc to get through and grok :). How similar is this to the work
> currently
> >> by the existing indexing implementations (huawei, Phoenix, ngdata)? The
> doc
> >> doesn't really nail down the interactions, but instead just right in
> after
> >> describing why SI should be added.
> >>
> >> Agree this would be super useful, but don't want to waste too much work
> >> reinventing the wheel or doing the wrong thing. further, this impl
> quickly
> >> starts to lead down the query optimization path, which get HBase away
> from
> >> its core "be a great byte store".
> >>
> >> Like I said, I'm all for secondary indexes in HBase and think this is a
> >> great push. I don't mean to rain on any parades.
> >>
> >> - jesse
> >>
> >> * but a smart way to specify region collocation? That I can get behind
> as
> >> it would unify a couple different indexing impls (e.g Phoenix would
> >> consider using it to help make indexing faster - RPCs do suck).
> >>
> >> ** for instance, the doc talks about how to implement indexing for
> >> floats... That might be a default impl, but for use cases like Phoenix
> this
> >> would break all our current encodings. We handled this is the indexing
> impl
> >> by making the builder pluggable for different use cases to support
> >> different encodings. I feel like a lot of the code for this kind of SI
> >> impl is already in Phoenix and has been working and fast for several
> months
> >> now; it's surprisingly tricky, especially with the delete cases and time
> >> stamp manipulation issues.
> >>
> >>
> >> On Thursday, January 9, 2014, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
> >> wrote:
> >>
> >>> Could you explain how the 1-1 association between user and index table
> >>> regions is maintained. I wasn't able to understand fully from the
> document.
> >>>
> >>> ----- Original Message -----
> >>> From: Ted Yu <dev@hbase.apache.org>
> >>> To: dev@hbase.apache.org
> >>> At: Jan 8, 2014 3:41:40 PM
> >>>
> >>> Hi,
> >>> Secondary index support is a frequently requested feature.
> >>>
> >>> Please find the updated design doc here:
> >>>
> >>>
> https://issues.apache.org/jira/secure/attachment/12621909/SecondaryIndex%20Design_Updated_2.pdf
> >>>
> >>> HBASE-9203 is the umbrella JIRA.
> >>>
> >>> Implementation patch was attached to HBASE-10222
> >>>
> >>> Thanks to Rajesh who works on this feature.
> >>>
> >>> Cheers
> >>>
> >>
> >>
> >> --
> >> -------------------
> >> Jesse Yates
> >> @jesse_yates
> >> jyates.github.com
> >
>
>

Re: Design review: Secondary index support through coprocess

Reply via email to