Re: [DISCUSS] Suggestions for Phoenix from HBaseCon Asia notes

Josh Elser Wed, 12 Sep 2018 12:47:34 -0700

I like it, Lars. I like it very much.

Just the easy part of doing it... ;)


On 9/11/18 4:53 PM, [email protected] wrote:

  Sorry for coming a bit late to this. I've been thinking about some of lines 
for a bit.
It seems Phoenix serves 4 distinct purposes:
1. Query parsing and compiling.2. A type system3. Query execution4. Efficient 
HBase interface
Each of these is useful by itself, but we do not expose these as stable interfaces.We 
have seen a lot of need to tie HBase into "higher level" service, such as Spark 
(and Presto, etc).
I think we can get a long way if we separate at least #1 (SQL) from the rest 
#2, #3, and #4 (Typed HBase Interface - THI).
Phoenix is used via SQL (#1), other tools such as Presto, Impala, Drill, Spark, 
etc, can interface efficiently with HBase via THI (#2, #3, and #4).
Thoughts?
-- Lars
     On Monday, August 27, 2018, 11:03:33 AM PDT, Josh Elser 
<[email protected]> wrote:

(bcc: dev@hbase, in case folks there have been waiting for me to send

this email to dev@phoenix)

Hi,

In case you missed it, there was an HBaseCon event held in Asia
recently. Stack took some great notes and shared them with the HBase
community. A few of them touched on Phoenix, directly or in a related
manner. I think they are good "criticisms" that are beneficial for us to
hear.

1. The phoenix-$version-client.jar size is prohibitively large

In this day and age, I'm surprised that this is a big issue for people.
I know have a lot of cruft, most of which coming from hadoop. We have
gotten better here over recent releases, but I would guess that there is
more we can do.

2. Can Phoenix be the de-facto schema for SQL on HBase?

We've long asserted "if you have to ask how Phoenix serializes data, you
shouldn't be do it" (a nod that you have to write lots of code). What if
we turn that on its head? Could we extract our PDataType serialization,
composite row-key, column encoding, etc into a minimal API that folks
with their own itches can use?

With the growing integrations into Phoenix, we could embrace them by
providing an API to make what they're doing easier. In the same vein, we
cement ourselves as a cornerstone of doing it "correctly".

3. Better recommendations to users to not attempt certain queries.

We definitively know that there are certain types of queries that
Phoenix cannot support well (compared to optimal Phoenix use-cases).
Users very commonly fall into such pitfalls on their own and this leaves
a bad taste in their mouth (thinking that the product "stinks").

Can we do a better job of telling the user when and why it happened?
What would such a user-interaction model look like? Can we supplement
the "why" with instructions of what to do differently (even if in the
abstract)?

4. Phoenix-Calcite

This was mentioned as a "nice to have". From what I understand, there
was nothing explicitly from with the implementation or approach, just
that it was a massive undertaking to continue with little immediate
gain. Would this be a boon for us to try to continue in some form? Are
there steps we can take that would help push us along the right path?

Anyways, I'd love to hear everyone's thoughts. While the concerns were
raised at HBaseCon Asia, the suggestions that accompany them here are
largely mine ;). Feel free to break them out into their own threads if
you think that would be better (or say that you disagree with me --
that's cool too)!

- Josh

Re: [DISCUSS] Suggestions for Phoenix from HBaseCon Asia notes

Reply via email to