Sorry for coming a bit late to this. I've been thinking about some of lines for a bit. It seems Phoenix serves 4 distinct purposes: 1. Query parsing and compiling.2. A type system3. Query execution4. Efficient HBase interface Each of these is useful by itself, but we do not expose these as stable interfaces.We have seen a lot of need to tie HBase into "higher level" service, such as Spark (and Presto, etc). I think we can get a long way if we separate at least #1 (SQL) from the rest #2, #3, and #4 (Typed HBase Interface - THI). Phoenix is used via SQL (#1), other tools such as Presto, Impala, Drill, Spark, etc, can interface efficiently with HBase via THI (#2, #3, and #4). Thoughts? -- Lars On Monday, August 27, 2018, 11:03:33 AM PDT, Josh Elser <els...@apache.org> wrote: (bcc: dev@hbase, in case folks there have been waiting for me to send this email to dev@phoenix)
Hi, In case you missed it, there was an HBaseCon event held in Asia recently. Stack took some great notes and shared them with the HBase community. A few of them touched on Phoenix, directly or in a related manner. I think they are good "criticisms" that are beneficial for us to hear. 1. The phoenix-$version-client.jar size is prohibitively large In this day and age, I'm surprised that this is a big issue for people. I know have a lot of cruft, most of which coming from hadoop. We have gotten better here over recent releases, but I would guess that there is more we can do. 2. Can Phoenix be the de-facto schema for SQL on HBase? We've long asserted "if you have to ask how Phoenix serializes data, you shouldn't be do it" (a nod that you have to write lots of code). What if we turn that on its head? Could we extract our PDataType serialization, composite row-key, column encoding, etc into a minimal API that folks with their own itches can use? With the growing integrations into Phoenix, we could embrace them by providing an API to make what they're doing easier. In the same vein, we cement ourselves as a cornerstone of doing it "correctly". 3. Better recommendations to users to not attempt certain queries. We definitively know that there are certain types of queries that Phoenix cannot support well (compared to optimal Phoenix use-cases). Users very commonly fall into such pitfalls on their own and this leaves a bad taste in their mouth (thinking that the product "stinks"). Can we do a better job of telling the user when and why it happened? What would such a user-interaction model look like? Can we supplement the "why" with instructions of what to do differently (even if in the abstract)? 4. Phoenix-Calcite This was mentioned as a "nice to have". From what I understand, there was nothing explicitly from with the implementation or approach, just that it was a massive undertaking to continue with little immediate gain. Would this be a boon for us to try to continue in some form? Are there steps we can take that would help push us along the right path? Anyways, I'd love to hear everyone's thoughts. While the concerns were raised at HBaseCon Asia, the suggestions that accompany them here are largely mine ;). Feel free to break them out into their own threads if you think that would be better (or say that you disagree with me -- that's cool too)! - Josh