100% agreement.
A bit worried about "boiling the ocean" and risking not getting done anything.
Speaking of modules. I would *love* if we had a simple HBase abstraction API 
and then a module for each version of HBase, rather than a different branch 
each.Most differences are presumably in coprocessors APIs, which should be able 
to be "wrapped away" with some indirection layer.

-- Lars

    On Monday, September 17, 2018, 8:52:58 AM PDT, Josh Elser 
<els...@apache.org> wrote:  
 
 Maybe an implementation detail, but I'm a fan of having a devoted Maven 
module to "client-facing" API as opposed to an annotation-based 
approach. I find a separate module helps to catch problematic API design 
faster, and make it crystal clear what users should (and should not) be 
relying upon).

On 9/17/18 1:00 AM, la...@apache.org wrote:
>  I think we can start by implementing a tighter integration with Spark 
>through DataSource V2.That would make it quickly apparent what parts of 
>Phoenix would need direct access.
> Some parts just need a interface audience declaration (like Phoenix's basic 
> type system) and our agreement that we will change those only according to 
> semantic versioning. Otherwise (like the query plan) will need a bit more 
> thinking. Maybe that's the path to hook Calcite - just making that part up as 
> I write this...
> Perhaps turning the HBase interface into an API might not be so difficult 
> either. That would perhaps be a new client - strictly additional - client API.
> 
> A good Spark interface is in everybody's interest and I think is the best 
> avenue to figure out what's missing/needed.
> -- Lars
> 
>      On Wednesday, September 12, 2018, 12:47:21 PM PDT, Josh Elser 
><els...@apache.org> wrote:
>  
>  I like it, Lars. I like it very much.
> 
> Just the easy part of doing it... ;)
> 
> On 9/11/18 4:53 PM, la...@apache.org wrote:
>>    Sorry for coming a bit late to this. I've been thinking about some of 
>>lines for a bit.
>> It seems Phoenix serves 4 distinct purposes:
>> 1. Query parsing and compiling.2. A type system3. Query execution4. 
>> Efficient HBase interface
>> Each of these is useful by itself, but we do not expose these as stable 
>> interfaces.We have seen a lot of need to tie HBase into "higher level" 
>> service, such as Spark (and Presto, etc).
>> I think we can get a long way if we separate at least #1 (SQL) from the rest 
>> #2, #3, and #4 (Typed HBase Interface - THI).
>> Phoenix is used via SQL (#1), other tools such as Presto, Impala, Drill, 
>> Spark, etc, can interface efficiently with HBase via THI (#2, #3, and #4).
>> Thoughts?
>> -- Lars
>>        On Monday, August 27, 2018, 11:03:33 AM PDT, Josh Elser 
>><els...@apache.org> wrote:
>>    
>>    (bcc: dev@hbase, in case folks there have been waiting for me to send
>> this email to dev@phoenix)
>>
>> Hi,
>>
>> In case you missed it, there was an HBaseCon event held in Asia
>> recently. Stack took some great notes and shared them with the HBase
>> community. A few of them touched on Phoenix, directly or in a related
>> manner. I think they are good "criticisms" that are beneficial for us to
>> hear.
>>
>> 1. The phoenix-$version-client.jar size is prohibitively large
>>
>> In this day and age, I'm surprised that this is a big issue for people.
>> I know have a lot of cruft, most of which coming from hadoop. We have
>> gotten better here over recent releases, but I would guess that there is
>> more we can do.
>>
>> 2. Can Phoenix be the de-facto schema for SQL on HBase?
>>
>> We've long asserted "if you have to ask how Phoenix serializes data, you
>> shouldn't be do it" (a nod that you have to write lots of code). What if
>> we turn that on its head? Could we extract our PDataType serialization,
>> composite row-key, column encoding, etc into a minimal API that folks
>> with their own itches can use?
>>
>> With the growing integrations into Phoenix, we could embrace them by
>> providing an API to make what they're doing easier. In the same vein, we
>> cement ourselves as a cornerstone of doing it "correctly".
>>
>> 3. Better recommendations to users to not attempt certain queries.
>>
>> We definitively know that there are certain types of queries that
>> Phoenix cannot support well (compared to optimal Phoenix use-cases).
>> Users very commonly fall into such pitfalls on their own and this leaves
>> a bad taste in their mouth (thinking that the product "stinks").
>>
>> Can we do a better job of telling the user when and why it happened?
>> What would such a user-interaction model look like? Can we supplement
>> the "why" with instructions of what to do differently (even if in the
>> abstract)?
>>
>> 4. Phoenix-Calcite
>>
>> This was mentioned as a "nice to have". From what I understand, there
>> was nothing explicitly from with the implementation or approach, just
>> that it was a massive undertaking to continue with little immediate
>> gain. Would this be a boon for us to try to continue in some form? Are
>> there steps we can take that would help push us along the right path?
>>
>> Anyways, I'd love to hear everyone's thoughts. While the concerns were
>> raised at HBaseCon Asia, the suggestions that accompany them here are
>> largely mine ;). Feel free to break them out into their own threads if
>> you think that would be better (or say that you disagree with me --
>> that's cool too)!
>>
>> - Josh
>>      
>>
>    
> 
  

Reply via email to