On Mon, Aug 27, 2018 at 11:03 AM Josh Elser <els...@apache.org> wrote:
> 2. Can Phoenix be the de-facto schema for SQL on HBase? > > We've long asserted "if you have to ask how Phoenix serializes data, you > shouldn't be do it" (a nod that you have to write lots of code). What if > we turn that on its head? Could we extract our PDataType serialization, > composite row-key, column encoding, etc into a minimal API that folks > with their own itches can use? > > With the growing integrations into Phoenix, we could embrace them by > providing an API to make what they're doing easier. In the same vein, we > cement ourselves as a cornerstone of doing it "correctly" > There have been discussion where I work where it seems this would be a great idea. If data types, row key constructors, and other key and data serialization concerns were a public API, these could be used by connectors to Spark or other systems to generate and consume Phoenix compatible data. It improves the integration story all around. Another thought for refactoring I've heard is exposing an API for generating query plans without needing the SQL parser. A public API for programmatically building query plans could used by connectors to Spark or other systems when pushing down parts of a parallelized or federated query to Phoenix data sources, avoiding unnecessary hacking SQL language generation, string mangling, or (re)parsing overheads. This kind of describes Calcite's raison d'ĂȘtre. If Phoenix is not embedding Calcite as query planner, as it does not currently, it is independently useful to have a public API for programmatic query plan construction given the current implementation regardless. If Phoenix were to embed Calcite as query planner, you'd probably get a ton of re-use among internal and external users of the Calcite APIs. I'd think whatever option you might choose would be informed by the suitability (or not) of embedding Calcite as Phoenix's query planner, and how soon that might be expected to be feature complete. For what it's worth. Again this extends possibilities for integration. > 3. Better recommendations to users to not attempt certain queries. > > We definitively know that there are certain types of queries that > Phoenix cannot support well (compared to optimal Phoenix use-cases). > Users very commonly fall into such pitfalls on their own and this leaves > a bad taste in their mouth (thinking that the product "stinks"). > > Can we do a better job of telling the user when and why it happened? > What would such a user-interaction model look like? Can we supplement > the "why" with instructions of what to do differently (even if in the > abstract)? > > 4. Phoenix-Calcite > > This was mentioned as a "nice to have". From what I understand, there > was nothing explicitly from with the implementation or approach, just > that it was a massive undertaking to continue with little immediate > gain. Would this be a boon for us to try to continue in some form? Are > there steps we can take that would help push us along the right path? > > Anyways, I'd love to hear everyone's thoughts. While the concerns were > raised at HBaseCon Asia, the suggestions that accompany them here are > largely mine ;). Feel free to break them out into their own threads if > you think that would be better (or say that you disagree with me -- > that's cool too)! > > - Josh > -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk