On Tue, Aug 28, 2018 at 2:01 PM James Taylor <jamestay...@apache.org> wrote:

> Glad to hear this was discussed at HBaseCon. The most common request I've
> seen asked for is to be able to write Phoenix-compatible data from other,
> non-Phoenix services/projects, mainly because row-by-row updates (even when
> batched) can be a bottleneck. This is not feasible by using low level
> constructs because of all the features provided by Phoenix: secondary
> indexes, composite row keys, encoded columns, storage formats, salting,
> ascending/descending row keys, array support, etc. The most feasible way to
> accomplish writes outside of Phoenix is to use UPSERT VALUES followed by
> PhoenixRuntime#getUncommittedDataIterator to get the Cells that would be
> committed (followed by rolling back the uncommitted data). This maintains
> Phoenix's abstract and minimizes any overhead (the cost of parsing is
> negligible). You can control the frequency of how often the schema is
> pulled over from the server through the UPDATE_CACHE_FREQUENCY declaration.
>
> I haven't seen much demand for bypassing Phoenix JDBC on the read side. If
> you don't want to use Phoenix to query, what's the point in using it?
>

You might have Phoenix clients and HBase clients sharing common data
sources, for whatever reason, we cannot assume what constraints or legacy
issues may present themselves in a given Phoenix or HBase user's
environment. Agree though as a question of prioritization maybe it doesn't
get done until a volunteer does it to scratch a real itch, but at that
point it could be useful to accept the contribution.


> As far as Calicte/Phoenix, it'd be great to see this work picked up. I
> don't think this solves the API problem, though. I good home for this
> adapter would be Apache Drill IMHO. They're up to a new enough version of
> Calcite (and off of their fork) so that this would be feasible and would
> provide immediate benefits on the query side.
>
> Thanks,
> James
>
> On Tue, Aug 28, 2018 at 1:38 PM Andrew Purtell <apurt...@apache.org>
> wrote:
>
> > On Mon, Aug 27, 2018 at 11:03 AM Josh Elser <els...@apache.org> wrote:
> >
> > > 2. Can Phoenix be the de-facto schema for SQL on HBase?
> > >
> > > We've long asserted "if you have to ask how Phoenix serializes data,
> you
> > > shouldn't be do it" (a nod that you have to write lots of code). What
> if
> > > we turn that on its head? Could we extract our PDataType serialization,
> > > composite row-key, column encoding, etc into a minimal API that folks
> > > with their own itches can use?
> > >
> > > With the growing integrations into Phoenix, we could embrace them by
> > > providing an API to make what they're doing easier. In the same vein,
> we
> > > cement ourselves as a cornerstone of doing it "correctly"
> > >
> >
> > There have been discussion where I work where it seems this would be a
> > great idea. If data types, row key constructors, and other key and data
> > serialization concerns were a public API, these could be used by
> connectors
> > to Spark or other systems to generate and consume Phoenix compatible
> data.
> > It improves the integration story all around.
> >
> > Another thought for refactoring I've heard is exposing an API for
> > generating query plans without needing the SQL parser. A public API  for
> > programmatically building query plans could used by connectors to Spark
> or
> > other systems when pushing down parts of a parallelized or federated
> query
> > to Phoenix data sources, avoiding unnecessary hacking SQL language
> > generation, string mangling, or (re)parsing overheads. This kind of
> > describes Calcite's raison d'ĂȘtre. If Phoenix is not embedding Calcite as
> > query planner, as it does not currently, it is independently useful to
> have
> > a public API for programmatic query plan construction given the current
> > implementation regardless. If Phoenix were to embed Calcite as query
> > planner, you'd probably get a ton of re-use among internal and external
> > users of the Calcite APIs. I'd think whatever option you might choose
> would
> > be informed by the suitability (or not) of embedding Calcite as Phoenix's
> > query planner, and how soon that might be expected to be feature
> complete.
> > For what it's worth. Again this extends possibilities for integration.
> >
> >
> > > 3. Better recommendations to users to not attempt certain queries.
> > >
> > > We definitively know that there are certain types of queries that
> > > Phoenix cannot support well (compared to optimal Phoenix use-cases).
> > > Users very commonly fall into such pitfalls on their own and this
> leaves
> > > a bad taste in their mouth (thinking that the product "stinks").
> > >
> > > Can we do a better job of telling the user when and why it happened?
> > > What would such a user-interaction model look like? Can we supplement
> > > the "why" with instructions of what to do differently (even if in the
> > > abstract)?
> > >
> > > 4. Phoenix-Calcite
> > >
> > > This was mentioned as a "nice to have". From what I understand, there
> > > was nothing explicitly from with the implementation or approach, just
> > > that it was a massive undertaking to continue with little immediate
> > > gain. Would this be a boon for us to try to continue in some form? Are
> > > there steps we can take that would help push us along the right path?
> > >
> > > Anyways, I'd love to hear everyone's thoughts. While the concerns were
> > > raised at HBaseCon Asia, the suggestions that accompany them here are
> > > largely mine ;). Feel free to break them out into their own threads if
> > > you think that would be better (or say that you disagree with me --
> > > that's cool too)!
> > >
> > > - Josh
> > >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >    - A23, Crosstalk
> >
>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Reply via email to