On Tue, Aug 28, 2018 at 2:01 PM James Taylor <jamestay...@apache.org> wrote:
> Glad to hear this was discussed at HBaseCon. The most common request I've > seen asked for is to be able to write Phoenix-compatible data from other, > non-Phoenix services/projects, mainly because row-by-row updates (even when > batched) can be a bottleneck. This is not feasible by using low level > constructs because of all the features provided by Phoenix: secondary > indexes, composite row keys, encoded columns, storage formats, salting, > ascending/descending row keys, array support, etc. The most feasible way to > accomplish writes outside of Phoenix is to use UPSERT VALUES followed by > PhoenixRuntime#getUncommittedDataIterator to get the Cells that would be > committed (followed by rolling back the uncommitted data). This maintains > Phoenix's abstract and minimizes any overhead (the cost of parsing is > negligible). You can control the frequency of how often the schema is > pulled over from the server through the UPDATE_CACHE_FREQUENCY declaration. > > I haven't seen much demand for bypassing Phoenix JDBC on the read side. If > you don't want to use Phoenix to query, what's the point in using it? > You might have Phoenix clients and HBase clients sharing common data sources, for whatever reason, we cannot assume what constraints or legacy issues may present themselves in a given Phoenix or HBase user's environment. Agree though as a question of prioritization maybe it doesn't get done until a volunteer does it to scratch a real itch, but at that point it could be useful to accept the contribution. > As far as Calicte/Phoenix, it'd be great to see this work picked up. I > don't think this solves the API problem, though. I good home for this > adapter would be Apache Drill IMHO. They're up to a new enough version of > Calcite (and off of their fork) so that this would be feasible and would > provide immediate benefits on the query side. > > Thanks, > James > > On Tue, Aug 28, 2018 at 1:38 PM Andrew Purtell <apurt...@apache.org> > wrote: > > > On Mon, Aug 27, 2018 at 11:03 AM Josh Elser <els...@apache.org> wrote: > > > > > 2. Can Phoenix be the de-facto schema for SQL on HBase? > > > > > > We've long asserted "if you have to ask how Phoenix serializes data, > you > > > shouldn't be do it" (a nod that you have to write lots of code). What > if > > > we turn that on its head? Could we extract our PDataType serialization, > > > composite row-key, column encoding, etc into a minimal API that folks > > > with their own itches can use? > > > > > > With the growing integrations into Phoenix, we could embrace them by > > > providing an API to make what they're doing easier. In the same vein, > we > > > cement ourselves as a cornerstone of doing it "correctly" > > > > > > > There have been discussion where I work where it seems this would be a > > great idea. If data types, row key constructors, and other key and data > > serialization concerns were a public API, these could be used by > connectors > > to Spark or other systems to generate and consume Phoenix compatible > data. > > It improves the integration story all around. > > > > Another thought for refactoring I've heard is exposing an API for > > generating query plans without needing the SQL parser. A public API for > > programmatically building query plans could used by connectors to Spark > or > > other systems when pushing down parts of a parallelized or federated > query > > to Phoenix data sources, avoiding unnecessary hacking SQL language > > generation, string mangling, or (re)parsing overheads. This kind of > > describes Calcite's raison d'ĂȘtre. If Phoenix is not embedding Calcite as > > query planner, as it does not currently, it is independently useful to > have > > a public API for programmatic query plan construction given the current > > implementation regardless. If Phoenix were to embed Calcite as query > > planner, you'd probably get a ton of re-use among internal and external > > users of the Calcite APIs. I'd think whatever option you might choose > would > > be informed by the suitability (or not) of embedding Calcite as Phoenix's > > query planner, and how soon that might be expected to be feature > complete. > > For what it's worth. Again this extends possibilities for integration. > > > > > > > 3. Better recommendations to users to not attempt certain queries. > > > > > > We definitively know that there are certain types of queries that > > > Phoenix cannot support well (compared to optimal Phoenix use-cases). > > > Users very commonly fall into such pitfalls on their own and this > leaves > > > a bad taste in their mouth (thinking that the product "stinks"). > > > > > > Can we do a better job of telling the user when and why it happened? > > > What would such a user-interaction model look like? Can we supplement > > > the "why" with instructions of what to do differently (even if in the > > > abstract)? > > > > > > 4. Phoenix-Calcite > > > > > > This was mentioned as a "nice to have". From what I understand, there > > > was nothing explicitly from with the implementation or approach, just > > > that it was a massive undertaking to continue with little immediate > > > gain. Would this be a boon for us to try to continue in some form? Are > > > there steps we can take that would help push us along the right path? > > > > > > Anyways, I'd love to hear everyone's thoughts. While the concerns were > > > raised at HBaseCon Asia, the suggestions that accompany them here are > > > largely mine ;). Feel free to break them out into their own threads if > > > you think that would be better (or say that you disagree with me -- > > > that's cool too)! > > > > > > - Josh > > > > > > > > > -- > > Best regards, > > Andrew > > > > Words like orphans lost among the crosstalk, meaning torn from truth's > > decrepit hands > > - A23, Crosstalk > > > -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk