Thanks, Lars for bringing that old discussion back, I did briefly talk
about it with Istvan and Josh.
As you pointed out, Istvan's attempt to modularize based on dependencies is
a good first step towards that goal.


@Istvan Toth <st...@cloudera.com>
Will try and test this on our local clusters sometime this week and will
let you know.

Thanks
Jacob


On Sun, Apr 18, 2021 at 6:13 PM la...@apache.org <la...@apache.org> wrote:

> There is also another angle to look at. A long time ago I wrote this:
>
> "
> It seems Phoenix serves 4 distinct purposes:
> 1. Query parsing and compiling.
> 2. A type system
> 3. Query execution
> 4. Efficient HBase interface
>
> Each of these is useful by itself, but we do not expose these as stable
> interfaces.
> We have seen a lot of need to tie HBase into "higher level" service, such
> as Spark (and Presto, etc).
> I think we can get a long way if we separate at least #1 (SQL) from the
> rest #2, #3, and #4 (Typed HBase Interface - THI).
> Phoenix is used via SQL (#1), other tools such as Presto, Impala, Drill,
> Spark, etc, can interface efficiently with HBase via THI (#2, #3, and #4).
> "
>
> I still believe this is an additional useful demarcation for how to group
> the code. And coincided somewhat with server/client.
>
> Query parsing and the type system are client. Query execution and HBase
> interface are both client and server.
>
> -- Lars
>
> On Wednesday, April 14, 2021, 8:56:08 AM PDT, Istvan Toth <
> st...@apache.org> wrote:
>
>
>
>
>
> Jacob, Josh and me had a discussion about the topic.
>
> I'm attaching the dependency graph of the proposed modules
>
>
>
> On Fri, Apr 9, 2021 at 6:30 AM Istvan Toth <st...@cloudera.com> wrote:
> > The bulk of the changes I'm working on is indeed the separation of the
> client and the server side code.
> >
> > Separating the MR related classes, and the tools-specific code (main,
> options parsing, etc) makes sense to me, if we don't mind adding another
> module.
> >
> > In the first WIP iteration, I'm splitting out everything that depends on
> more than hbase-client into a "server" module.
> > Once that works I will look at splitting that further into a  real
> "server" and an "MR/tools" module.
> >
> >
> > My initial estimates about splitting the server side code were way too
> optimistic, we have to touch a lot of code to break circular dependencies
> between the client and server side. The changes are still quite trivial,
> but the patch is going to be huge and scary.
> >
> >
> > Tests are also going to be a problem, we're probably going to have to
> move most of them into the "server" or a separate "tests" module, as the
> MiniCluster tests depend on code from each module.
> >
> > The plan in PHOENIX-5483, and Lars's mail sounds good, but I think that
> it would be more about dividing the "client-side" module further.
> > (BTW I think that making the indexing engine available separately would
> also be a popular feature )
> >
> >
> >
> > On Fri, Apr 9, 2021 at 5:39 AM Daniel Wong <dbw...@apache.org> wrote:
> >> This is another project I am interested in as well as my group at
> >> Salesforce.  We have had some discussions internally on this but I
> wasn't
> >> aware of this specific Spark issue (We only allow phoenix access via
> spark
> >> by default).  I think the approaches outlined are a good initial step
> but
> >> we were also considering a larger breakup of phoenix-core.  I don't
> >> think the desire for the larger step should stop us from doing the
> initial
> >> ones Istavan and Josh proposed.  I think the high level plan makes sense
> >> but I might prefer a different name than phoenix-tools for the ones we
> want
> >> to be available to external libraries like phoenix-connectors.  Another
> >> possible alternative is to restructure maybe less invasively by making
> >> phoenix core like your proposed tools and making a phoenix-internal or
> >> similar for the future.
> >> One thing I was wondering was how much effort it was to split
> client/server
> >> through phoenix-core...  Lars layed out a good component view of phoenix
> >> whosethe first step might be PHOENIx-5483 but we could focus on highest
> >> level separation rather than bottom up.  However, even that thread
> linked
> >> there talks about a client-facing api which we can piggyback for this
> use.
> >> Say phoeinx-public-api or similar.
> >>
> >> On Wed, Apr 7, 2021 at 9:43 AM Jacob Isaac <jacobpisaa...@gmail.com>
> wrote:
> >>
> >>> Hi Josh & Istvan
> >>>
> >>> Thanks Istvan for looking into this, I am also interested in solving
> this
> >>> problem,
> >>> Let me know how I can help?
> >>>
> >>> Thanks
> >>> Jacob
> >>>
> >>> On Wed, Apr 7, 2021 at 9:05 AM Josh Elser <els...@apache.org> wrote:
> >>>
> >>> > Thanks for trying to tackle this sticky problem, Istvan. For the
> context
> >>> > of everyone else, the real-life problem Istvan is trying to fix is
> that
> >>> > you cannot run a Spark application with both HBase and Phoenix jars
> on
> >>> > the classpath.
> >>> >
> >>> > If I understand this correctly, it's that the HBase API signatures
> are
> >>> > different depending on whether we are "client side" or "server side"
> >>> > (within a RegionServer). Your comment on PHOENIX-6053 shows that
> >>> > (signatures on Table.java around Protobuf's Service class having
> shaded
> >>> > relocation vs. the original com.google.protobuf coordinates).
> >>> >
> >>> > I think the reason we have the monolithic phoenix-core is that we
> have
> >>> > so much logic which is executed on both the client and server side.
> For
> >>> > example, we may push a filter operation to the server-side or we many
> >>> > run it client-side. That's also why we have the "thin" phoenix-server
> >>> > Maven module which just re-packages phoenix-core.
> >>> >
> >>> > Is it possible that we change phoenix-server so that it contains the
> >>> > "server-side" code that we don't want to have using the HBase classes
> >>> > with thirdparty relocations, rather than introduce another new Maven
> >>> > module?
> >>> >
> >>> > Looking through your WIP PR too.
> >>> >
> >>> > On 4/7/21 1:10 AM, Istvan Toth wrote:
> >>> > > Hi!
> >>> > >
> >>> > > I've been working on getting Phoenix working with
> >>> > hbase-shaded-client.jar,
> >>> > > and I am finally getting traction.
> >>> > >
> >>> > > One of the issues that I encountered is that we are mixing client
> and
> >>> > > server side code in phoenix-core, and there's a
> >>> > > mutual interdependence between the two.
> >>> > >
> >>> > > Fixing this is not hard, as it's mostly about replacing
> >>> .class.getName()
> >>> > s
> >>> > > with string constants, and moving around some inconveniently placed
> >>> > static
> >>> > > utility methods, and now I have a WIP version where the client side
> >>> > doesn't
> >>> > > depend on server classes.
> >>> > >
> >>> > > However, unless we change the project structure, and factor out the
> >>> > classes
> >>> > > that depend on server-side APIs, this will be extremely fragile,
> as any
> >>> > > change can (and will) re-introduce the circular dependency between
> the
> >>> > > classes.
> >>> > >
> >>> > > To solve this issue I propose the following:
> >>> > >
> >>> > >     - clean up phoenix-core, so that only classes that depend only
> on
> >>> > >     *hbase-client* (or at worst only on classes that are present in
> >>> > >     *hbase-shaded-client*) remain. This should be 90+% of the code
> >>> > >     - move all classes (mostly coprocessors and their support code)
> >>> that
> >>> > use
> >>> > >     the server API (*hbase-server* mostly) to a new module, say
> >>> > >     phoenix-coprocessors (the phoenix-server module name is taken).
> >>> This
> >>> > new
> >>> > >     class depends on phoenix-core.
> >>> > >     - move all classes that directly depend on MapReduce, and their
> >>> > main()
> >>> > >     classes to the existing phoenix-tools module (which also
> depends on
> >>> > core)
> >>> > >
> >>> > > The separation would be primarily based on API use, at the first
> cut
> >>> I'd
> >>> > be
> >>> > > fine with keeping all logic phoenix-core, and referencing that. We
> may
> >>> or
> >>> > > may not want to move logic that is only used in coprocessors or
> tools,
> >>> > but
> >>> > > doesn't use the respective APIs to the new modules later.
> >>> > >
> >>> > > As for the main artifacts:
> >>> > >
> >>> > >     - *phoenix-server.jar* would include code from all three
> classes.
> >>> > >     - A newly added *phoenix-client-byo-shaded-hbase.jar *would
> include
> >>> > only
> >>> > >     the code from cleaned-up phoenix-core
> >>> > >     - Ideally, we'd remove the the tools and coprocessor code (and
> >>> > >     dependencies) from the standard and embedded clients, and
> switch
> >>> > >     documentation to use *phoenix-server* to run the MR tools, but
> this
> >>> > is
> >>> > >     optional.
> >>> > >
> >>> > > I am tracking this work in PHOENIX-6053, which has a (currently
> >>> working)
> >>> > > WIP patch attached.
> >>> > >
> >>> > > I think that this change would fit the pattern established by
> creating
> >>> > the
> >>> > > phoenix-tools module,
> >>> > > but as this is major change in project structure (even if the
> actual
> >>> Java
> >>> > > changes are trivial),
> >>> > > I'd like to gather your input on this approach (please also speak
> up if
> >>> > you
> >>> > > agree).
> >>> > >
> >>> > > regards
> >>> > > Istvan
> >>> > >
> >>> >
> >>>
> >>
> >
> >
> > --
> > István Tóth  | Staff Software Engineer
> >
> > st...@cloudera.com
> >
> >
> > ________________________________
> >
>
>

Reply via email to