Thanks, Lars for bringing that old discussion back, I did briefly talk about it with Istvan and Josh. As you pointed out, Istvan's attempt to modularize based on dependencies is a good first step towards that goal.
@Istvan Toth <st...@cloudera.com> Will try and test this on our local clusters sometime this week and will let you know. Thanks Jacob On Sun, Apr 18, 2021 at 6:13 PM la...@apache.org <la...@apache.org> wrote: > There is also another angle to look at. A long time ago I wrote this: > > " > It seems Phoenix serves 4 distinct purposes: > 1. Query parsing and compiling. > 2. A type system > 3. Query execution > 4. Efficient HBase interface > > Each of these is useful by itself, but we do not expose these as stable > interfaces. > We have seen a lot of need to tie HBase into "higher level" service, such > as Spark (and Presto, etc). > I think we can get a long way if we separate at least #1 (SQL) from the > rest #2, #3, and #4 (Typed HBase Interface - THI). > Phoenix is used via SQL (#1), other tools such as Presto, Impala, Drill, > Spark, etc, can interface efficiently with HBase via THI (#2, #3, and #4). > " > > I still believe this is an additional useful demarcation for how to group > the code. And coincided somewhat with server/client. > > Query parsing and the type system are client. Query execution and HBase > interface are both client and server. > > -- Lars > > On Wednesday, April 14, 2021, 8:56:08 AM PDT, Istvan Toth < > st...@apache.org> wrote: > > > > > > Jacob, Josh and me had a discussion about the topic. > > I'm attaching the dependency graph of the proposed modules > > > > On Fri, Apr 9, 2021 at 6:30 AM Istvan Toth <st...@cloudera.com> wrote: > > The bulk of the changes I'm working on is indeed the separation of the > client and the server side code. > > > > Separating the MR related classes, and the tools-specific code (main, > options parsing, etc) makes sense to me, if we don't mind adding another > module. > > > > In the first WIP iteration, I'm splitting out everything that depends on > more than hbase-client into a "server" module. > > Once that works I will look at splitting that further into a real > "server" and an "MR/tools" module. > > > > > > My initial estimates about splitting the server side code were way too > optimistic, we have to touch a lot of code to break circular dependencies > between the client and server side. The changes are still quite trivial, > but the patch is going to be huge and scary. > > > > > > Tests are also going to be a problem, we're probably going to have to > move most of them into the "server" or a separate "tests" module, as the > MiniCluster tests depend on code from each module. > > > > The plan in PHOENIX-5483, and Lars's mail sounds good, but I think that > it would be more about dividing the "client-side" module further. > > (BTW I think that making the indexing engine available separately would > also be a popular feature ) > > > > > > > > On Fri, Apr 9, 2021 at 5:39 AM Daniel Wong <dbw...@apache.org> wrote: > >> This is another project I am interested in as well as my group at > >> Salesforce. We have had some discussions internally on this but I > wasn't > >> aware of this specific Spark issue (We only allow phoenix access via > spark > >> by default). I think the approaches outlined are a good initial step > but > >> we were also considering a larger breakup of phoenix-core. I don't > >> think the desire for the larger step should stop us from doing the > initial > >> ones Istavan and Josh proposed. I think the high level plan makes sense > >> but I might prefer a different name than phoenix-tools for the ones we > want > >> to be available to external libraries like phoenix-connectors. Another > >> possible alternative is to restructure maybe less invasively by making > >> phoenix core like your proposed tools and making a phoenix-internal or > >> similar for the future. > >> One thing I was wondering was how much effort it was to split > client/server > >> through phoenix-core... Lars layed out a good component view of phoenix > >> whosethe first step might be PHOENIx-5483 but we could focus on highest > >> level separation rather than bottom up. However, even that thread > linked > >> there talks about a client-facing api which we can piggyback for this > use. > >> Say phoeinx-public-api or similar. > >> > >> On Wed, Apr 7, 2021 at 9:43 AM Jacob Isaac <jacobpisaa...@gmail.com> > wrote: > >> > >>> Hi Josh & Istvan > >>> > >>> Thanks Istvan for looking into this, I am also interested in solving > this > >>> problem, > >>> Let me know how I can help? > >>> > >>> Thanks > >>> Jacob > >>> > >>> On Wed, Apr 7, 2021 at 9:05 AM Josh Elser <els...@apache.org> wrote: > >>> > >>> > Thanks for trying to tackle this sticky problem, Istvan. For the > context > >>> > of everyone else, the real-life problem Istvan is trying to fix is > that > >>> > you cannot run a Spark application with both HBase and Phoenix jars > on > >>> > the classpath. > >>> > > >>> > If I understand this correctly, it's that the HBase API signatures > are > >>> > different depending on whether we are "client side" or "server side" > >>> > (within a RegionServer). Your comment on PHOENIX-6053 shows that > >>> > (signatures on Table.java around Protobuf's Service class having > shaded > >>> > relocation vs. the original com.google.protobuf coordinates). > >>> > > >>> > I think the reason we have the monolithic phoenix-core is that we > have > >>> > so much logic which is executed on both the client and server side. > For > >>> > example, we may push a filter operation to the server-side or we many > >>> > run it client-side. That's also why we have the "thin" phoenix-server > >>> > Maven module which just re-packages phoenix-core. > >>> > > >>> > Is it possible that we change phoenix-server so that it contains the > >>> > "server-side" code that we don't want to have using the HBase classes > >>> > with thirdparty relocations, rather than introduce another new Maven > >>> > module? > >>> > > >>> > Looking through your WIP PR too. > >>> > > >>> > On 4/7/21 1:10 AM, Istvan Toth wrote: > >>> > > Hi! > >>> > > > >>> > > I've been working on getting Phoenix working with > >>> > hbase-shaded-client.jar, > >>> > > and I am finally getting traction. > >>> > > > >>> > > One of the issues that I encountered is that we are mixing client > and > >>> > > server side code in phoenix-core, and there's a > >>> > > mutual interdependence between the two. > >>> > > > >>> > > Fixing this is not hard, as it's mostly about replacing > >>> .class.getName() > >>> > s > >>> > > with string constants, and moving around some inconveniently placed > >>> > static > >>> > > utility methods, and now I have a WIP version where the client side > >>> > doesn't > >>> > > depend on server classes. > >>> > > > >>> > > However, unless we change the project structure, and factor out the > >>> > classes > >>> > > that depend on server-side APIs, this will be extremely fragile, > as any > >>> > > change can (and will) re-introduce the circular dependency between > the > >>> > > classes. > >>> > > > >>> > > To solve this issue I propose the following: > >>> > > > >>> > > - clean up phoenix-core, so that only classes that depend only > on > >>> > > *hbase-client* (or at worst only on classes that are present in > >>> > > *hbase-shaded-client*) remain. This should be 90+% of the code > >>> > > - move all classes (mostly coprocessors and their support code) > >>> that > >>> > use > >>> > > the server API (*hbase-server* mostly) to a new module, say > >>> > > phoenix-coprocessors (the phoenix-server module name is taken). > >>> This > >>> > new > >>> > > class depends on phoenix-core. > >>> > > - move all classes that directly depend on MapReduce, and their > >>> > main() > >>> > > classes to the existing phoenix-tools module (which also > depends on > >>> > core) > >>> > > > >>> > > The separation would be primarily based on API use, at the first > cut > >>> I'd > >>> > be > >>> > > fine with keeping all logic phoenix-core, and referencing that. We > may > >>> or > >>> > > may not want to move logic that is only used in coprocessors or > tools, > >>> > but > >>> > > doesn't use the respective APIs to the new modules later. > >>> > > > >>> > > As for the main artifacts: > >>> > > > >>> > > - *phoenix-server.jar* would include code from all three > classes. > >>> > > - A newly added *phoenix-client-byo-shaded-hbase.jar *would > include > >>> > only > >>> > > the code from cleaned-up phoenix-core > >>> > > - Ideally, we'd remove the the tools and coprocessor code (and > >>> > > dependencies) from the standard and embedded clients, and > switch > >>> > > documentation to use *phoenix-server* to run the MR tools, but > this > >>> > is > >>> > > optional. > >>> > > > >>> > > I am tracking this work in PHOENIX-6053, which has a (currently > >>> working) > >>> > > WIP patch attached. > >>> > > > >>> > > I think that this change would fit the pattern established by > creating > >>> > the > >>> > > phoenix-tools module, > >>> > > but as this is major change in project structure (even if the > actual > >>> Java > >>> > > changes are trivial), > >>> > > I'd like to gather your input on this approach (please also speak > up if > >>> > you > >>> > > agree). > >>> > > > >>> > > regards > >>> > > Istvan > >>> > > > >>> > > >>> > >> > > > > > > -- > > István Tóth | Staff Software Engineer > > > > st...@cloudera.com > > > > > > ________________________________ > > > >