I think we can start by implementing a tighter integration with Spark through DataSource V2.That would make it quickly apparent what parts of Phoenix would need direct access. Some parts just need a interface audience declaration (like Phoenix's basic type system) and our agreement that we will change those only according to semantic versioning. Otherwise (like the query plan) will need a bit more thinking. Maybe that's the path to hook Calcite - just making that part up as I write this... Perhaps turning the HBase interface into an API might not be so difficult either. That would perhaps be a new client - strictly additional - client API.
A good Spark interface is in everybody's interest and I think is the best avenue to figure out what's missing/needed. -- Lars On Wednesday, September 12, 2018, 12:47:21 PM PDT, Josh Elser <els...@apache.org> wrote: I like it, Lars. I like it very much. Just the easy part of doing it... ;) On 9/11/18 4:53 PM, la...@apache.org wrote: > Sorry for coming a bit late to this. I've been thinking about some of lines >for a bit. > It seems Phoenix serves 4 distinct purposes: > 1. Query parsing and compiling.2. A type system3. Query execution4. Efficient > HBase interface > Each of these is useful by itself, but we do not expose these as stable > interfaces.We have seen a lot of need to tie HBase into "higher level" > service, such as Spark (and Presto, etc). > I think we can get a long way if we separate at least #1 (SQL) from the rest > #2, #3, and #4 (Typed HBase Interface - THI). > Phoenix is used via SQL (#1), other tools such as Presto, Impala, Drill, > Spark, etc, can interface efficiently with HBase via THI (#2, #3, and #4). > Thoughts? > -- Lars > On Monday, August 27, 2018, 11:03:33 AM PDT, Josh Elser ><els...@apache.org> wrote: > > (bcc: dev@hbase, in case folks there have been waiting for me to send > this email to dev@phoenix) > > Hi, > > In case you missed it, there was an HBaseCon event held in Asia > recently. Stack took some great notes and shared them with the HBase > community. A few of them touched on Phoenix, directly or in a related > manner. I think they are good "criticisms" that are beneficial for us to > hear. > > 1. The phoenix-$version-client.jar size is prohibitively large > > In this day and age, I'm surprised that this is a big issue for people. > I know have a lot of cruft, most of which coming from hadoop. We have > gotten better here over recent releases, but I would guess that there is > more we can do. > > 2. Can Phoenix be the de-facto schema for SQL on HBase? > > We've long asserted "if you have to ask how Phoenix serializes data, you > shouldn't be do it" (a nod that you have to write lots of code). What if > we turn that on its head? Could we extract our PDataType serialization, > composite row-key, column encoding, etc into a minimal API that folks > with their own itches can use? > > With the growing integrations into Phoenix, we could embrace them by > providing an API to make what they're doing easier. In the same vein, we > cement ourselves as a cornerstone of doing it "correctly". > > 3. Better recommendations to users to not attempt certain queries. > > We definitively know that there are certain types of queries that > Phoenix cannot support well (compared to optimal Phoenix use-cases). > Users very commonly fall into such pitfalls on their own and this leaves > a bad taste in their mouth (thinking that the product "stinks"). > > Can we do a better job of telling the user when and why it happened? > What would such a user-interaction model look like? Can we supplement > the "why" with instructions of what to do differently (even if in the > abstract)? > > 4. Phoenix-Calcite > > This was mentioned as a "nice to have". From what I understand, there > was nothing explicitly from with the implementation or approach, just > that it was a massive undertaking to continue with little immediate > gain. Would this be a boon for us to try to continue in some form? Are > there steps we can take that would help push us along the right path? > > Anyways, I'd love to hear everyone's thoughts. While the concerns were > raised at HBaseCon Asia, the suggestions that accompany them here are > largely mine ;). Feel free to break them out into their own threads if > you think that would be better (or say that you disagree with me -- > that's cool too)! > > - Josh > >