100% agreement. A bit worried about "boiling the ocean" and risking not getting done anything. Speaking of modules. I would *love* if we had a simple HBase abstraction API and then a module for each version of HBase, rather than a different branch each.Most differences are presumably in coprocessors APIs, which should be able to be "wrapped away" with some indirection layer.
-- Lars On Monday, September 17, 2018, 8:52:58 AM PDT, Josh Elser <els...@apache.org> wrote: Maybe an implementation detail, but I'm a fan of having a devoted Maven module to "client-facing" API as opposed to an annotation-based approach. I find a separate module helps to catch problematic API design faster, and make it crystal clear what users should (and should not) be relying upon). On 9/17/18 1:00 AM, la...@apache.org wrote: > I think we can start by implementing a tighter integration with Spark >through DataSource V2.That would make it quickly apparent what parts of >Phoenix would need direct access. > Some parts just need a interface audience declaration (like Phoenix's basic > type system) and our agreement that we will change those only according to > semantic versioning. Otherwise (like the query plan) will need a bit more > thinking. Maybe that's the path to hook Calcite - just making that part up as > I write this... > Perhaps turning the HBase interface into an API might not be so difficult > either. That would perhaps be a new client - strictly additional - client API. > > A good Spark interface is in everybody's interest and I think is the best > avenue to figure out what's missing/needed. > -- Lars > > On Wednesday, September 12, 2018, 12:47:21 PM PDT, Josh Elser ><els...@apache.org> wrote: > > I like it, Lars. I like it very much. > > Just the easy part of doing it... ;) > > On 9/11/18 4:53 PM, la...@apache.org wrote: >> Sorry for coming a bit late to this. I've been thinking about some of >>lines for a bit. >> It seems Phoenix serves 4 distinct purposes: >> 1. Query parsing and compiling.2. A type system3. Query execution4. >> Efficient HBase interface >> Each of these is useful by itself, but we do not expose these as stable >> interfaces.We have seen a lot of need to tie HBase into "higher level" >> service, such as Spark (and Presto, etc). >> I think we can get a long way if we separate at least #1 (SQL) from the rest >> #2, #3, and #4 (Typed HBase Interface - THI). >> Phoenix is used via SQL (#1), other tools such as Presto, Impala, Drill, >> Spark, etc, can interface efficiently with HBase via THI (#2, #3, and #4). >> Thoughts? >> -- Lars >> On Monday, August 27, 2018, 11:03:33 AM PDT, Josh Elser >><els...@apache.org> wrote: >> >> (bcc: dev@hbase, in case folks there have been waiting for me to send >> this email to dev@phoenix) >> >> Hi, >> >> In case you missed it, there was an HBaseCon event held in Asia >> recently. Stack took some great notes and shared them with the HBase >> community. A few of them touched on Phoenix, directly or in a related >> manner. I think they are good "criticisms" that are beneficial for us to >> hear. >> >> 1. The phoenix-$version-client.jar size is prohibitively large >> >> In this day and age, I'm surprised that this is a big issue for people. >> I know have a lot of cruft, most of which coming from hadoop. We have >> gotten better here over recent releases, but I would guess that there is >> more we can do. >> >> 2. Can Phoenix be the de-facto schema for SQL on HBase? >> >> We've long asserted "if you have to ask how Phoenix serializes data, you >> shouldn't be do it" (a nod that you have to write lots of code). What if >> we turn that on its head? Could we extract our PDataType serialization, >> composite row-key, column encoding, etc into a minimal API that folks >> with their own itches can use? >> >> With the growing integrations into Phoenix, we could embrace them by >> providing an API to make what they're doing easier. In the same vein, we >> cement ourselves as a cornerstone of doing it "correctly". >> >> 3. Better recommendations to users to not attempt certain queries. >> >> We definitively know that there are certain types of queries that >> Phoenix cannot support well (compared to optimal Phoenix use-cases). >> Users very commonly fall into such pitfalls on their own and this leaves >> a bad taste in their mouth (thinking that the product "stinks"). >> >> Can we do a better job of telling the user when and why it happened? >> What would such a user-interaction model look like? Can we supplement >> the "why" with instructions of what to do differently (even if in the >> abstract)? >> >> 4. Phoenix-Calcite >> >> This was mentioned as a "nice to have". From what I understand, there >> was nothing explicitly from with the implementation or approach, just >> that it was a massive undertaking to continue with little immediate >> gain. Would this be a boon for us to try to continue in some form? Are >> there steps we can take that would help push us along the right path? >> >> Anyways, I'd love to hear everyone's thoughts. While the concerns were >> raised at HBaseCon Asia, the suggestions that accompany them here are >> largely mine ;). Feel free to break them out into their own threads if >> you think that would be better (or say that you disagree with me -- >> that's cool too)! >> >> - Josh >> >> > >