Re: [DISCUSS] Suggestions for Phoenix from HBaseCon Asia notes

James Taylor Tue, 28 Aug 2018 14:01:47 -0700

Glad to hear this was discussed at HBaseCon. The most common request I've
seen asked for is to be able to write Phoenix-compatible data from other,
non-Phoenix services/projects, mainly because row-by-row updates (even when
batched) can be a bottleneck. This is not feasible by using low level
constructs because of all the features provided by Phoenix: secondary
indexes, composite row keys, encoded columns, storage formats, salting,
ascending/descending row keys, array support, etc. The most feasible way to
accomplish writes outside of Phoenix is to use UPSERT VALUES followed by
PhoenixRuntime#getUncommittedDataIterator to get the Cells that would be
committed (followed by rolling back the uncommitted data). This maintains
Phoenix's abstract and minimizes any overhead (the cost of parsing is
negligible). You can control the frequency of how often the schema is
pulled over from the server through the UPDATE_CACHE_FREQUENCY declaration.


I haven't seen much demand for bypassing Phoenix JDBC on the read side. If
you don't want to use Phoenix to query, what's the point in using it?

As far as Calicte/Phoenix, it'd be great to see this work picked up. I
don't think this solves the API problem, though. I good home for this
adapter would be Apache Drill IMHO. They're up to a new enough version of
Calcite (and off of their fork) so that this would be feasible and would
provide immediate benefits on the query side.

Thanks,
James

On Tue, Aug 28, 2018 at 1:38 PM Andrew Purtell <apurt...@apache.org> wrote:

> On Mon, Aug 27, 2018 at 11:03 AM Josh Elser <els...@apache.org> wrote:
>
> > 2. Can Phoenix be the de-facto schema for SQL on HBase?
> >
> > We've long asserted "if you have to ask how Phoenix serializes data, you
> > shouldn't be do it" (a nod that you have to write lots of code). What if
> > we turn that on its head? Could we extract our PDataType serialization,
> > composite row-key, column encoding, etc into a minimal API that folks
> > with their own itches can use?
> >
> > With the growing integrations into Phoenix, we could embrace them by
> > providing an API to make what they're doing easier. In the same vein, we
> > cement ourselves as a cornerstone of doing it "correctly"
> >
>
> There have been discussion where I work where it seems this would be a
> great idea. If data types, row key constructors, and other key and data
> serialization concerns were a public API, these could be used by connectors
> to Spark or other systems to generate and consume Phoenix compatible data.
> It improves the integration story all around.
>
> Another thought for refactoring I've heard is exposing an API for
> generating query plans without needing the SQL parser. A public API  for
> programmatically building query plans could used by connectors to Spark or
> other systems when pushing down parts of a parallelized or federated query
> to Phoenix data sources, avoiding unnecessary hacking SQL language
> generation, string mangling, or (re)parsing overheads. This kind of
> describes Calcite's raison d'être. If Phoenix is not embedding Calcite as
> query planner, as it does not currently, it is independently useful to have
> a public API for programmatic query plan construction given the current
> implementation regardless. If Phoenix were to embed Calcite as query
> planner, you'd probably get a ton of re-use among internal and external
> users of the Calcite APIs. I'd think whatever option you might choose would
> be informed by the suitability (or not) of embedding Calcite as Phoenix's
> query planner, and how soon that might be expected to be feature complete.
> For what it's worth. Again this extends possibilities for integration.
>
>
> > 3. Better recommendations to users to not attempt certain queries.
> >
> > We definitively know that there are certain types of queries that
> > Phoenix cannot support well (compared to optimal Phoenix use-cases).
> > Users very commonly fall into such pitfalls on their own and this leaves
> > a bad taste in their mouth (thinking that the product "stinks").
> >
> > Can we do a better job of telling the user when and why it happened?
> > What would such a user-interaction model look like? Can we supplement
> > the "why" with instructions of what to do differently (even if in the
> > abstract)?
> >
> > 4. Phoenix-Calcite
> >
> > This was mentioned as a "nice to have". From what I understand, there
> > was nothing explicitly from with the implementation or approach, just
> > that it was a massive undertaking to continue with little immediate
> > gain. Would this be a boon for us to try to continue in some form? Are
> > there steps we can take that would help push us along the right path?
> >
> > Anyways, I'd love to hear everyone's thoughts. While the concerns were
> > raised at HBaseCon Asia, the suggestions that accompany them here are
> > largely mine ;). Feel free to break them out into their own threads if
> > you think that would be better (or say that you disagree with me --
> > that's cool too)!
> >
> > - Josh
> >
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Re: [DISCUSS] Suggestions for Phoenix from HBaseCon Asia notes

Reply via email to