Hi! This was originally discussed back in 2021, but changes in priorities meant that no progress has been made on this until now, when Aron picked up and polished the patch.
We are tracking this work in* PHOENIX-6053*. I have tried to give a (not so) quick summary of the change below, but the original discussion thread has more details, and I suggest reading that as well: https://lists.apache.org/thread/hs4klbc04n4gh62z17pznc0rkspjg6jx *Motivation:*The huge amount of dependencies for Phoenix is an ongoing problem. To use the thick client, you either need to depend on it on maven, which brings in dozens of large, complex and commonly used dependencies, or you need to use the shaded phoenix-client artifact, which includes every dependency, and attempts to shade everything that can be shaded. When going the unshaded route, you need to make sure that your application works with Phoenix's version of the dependencies, or that your application's version doesn't break Phoenix. When using the shaded artifact, this is less of an issue, but there are still cases when shading doesn't help, or causes additional problems. One such issue is you cannot have any Hadoop or Hbase libraries on the classpath, as they fail hard when shaded and unshaded (or at least not phoenix shaded) jars are mixed. Another issue is https://issues.apache.org/jira/browse/PHOENIX-6861, where a shading change broke PQS. The direct motivation for us was a project, which needed to use phoenix along with other Hadoop stack component, where we couldn't use phoenix-client because of the shading conflicts, and we couldn't use phoenix-core because of a protobuf (2.5.0) version conflict. *Proposed solution:* *STEP 1 (The current patch):*Split the current phoenix-core module into two parts: *phoenix-core* retains the all code that is needed for the thick client, and excludes everything that uses either the hbase-server or mapreduce APIs. *phoenix-coprocessors* includes everything that is not needed for the thick client (i.e. server side code), and/or depends on hbase-server, or the mapreduce libraries. phoenix-coprocessors of course depends on phoenix-core, and both *phoenix-client* and *phoenix-server *depend on both. This is of course easier said and done, as there are a lot of circular dependencies between these modules, which need to be broken. *STEP 2 (Future patch):* Introduce a new artifact, *phoenix-client-lite*, which does not include the *phoenix-coprocessors code*, neither its dependencies (*hbase-server, mapreduce*) This is mostly a size optimization, last I checked it shaves ~30 megabytes off the current phoenix-client jar size. This would be the one most "normal" applications, including phoenix-sqlline would add to their classpath Introduce a new artifact, *phoenix-client-byo-hbase* which is modelled after *hbase-client-byo-hadoop*. This one includes phoenix-core, and its direct non-hbase dependencies, but uses hbase (and Hadoop/MR) from the *hbase-client *or *hbase-client-byo-hadoop *jars*. *We need to make some changes to shading to cover the differences between standard Hbase API and the shaded hbase-client API. This solves the hadoop/hbase coexistence problem. *STEP 3 (Future patch)* To solve the various classpath issues, the current connectors use custom shading. The Spark connector in particular needs to coexist with hbase-client on the Spark classpath, and requires the same shading changes that * phoenix-client-byo-hbase* does. The Hive connector ultimately needs to get the same treatment (see PHOENIX-6939, we've done that downstream a long time ago). Instead of the current custom shaded 70-80MB JARs, these could be a few dozen kilobytes, only containing the actual connector code and depending on *phoenix-client-byo-hbase.*Unfortunately, this is complicated by the phoenix mapreduce code depending on hbase-server. *Open questions:**Pacing:* Does the three step plan above make sense, or should we split it further, or consolidate them into one or two ? Perhaps doing STEP 1 and 2 in patch would let us test whether the new artifacts indeed work as expected before committing the changes. *Naming:*Both the module names, and the package names introduced are up for debate. For now, we've renamed org.apache.phoenix.coprocessor to org.apache.phoenix.coprocessorclient on the client side, to avoid having a package named "coprocessor" in the client, and had to invent new names for some classes, but we are open to any ideas. As for the modules, *phoenix-client* and* phoenix-server *are already taken, so we went with *phoenix-coprocessors*, but better names are welcome either for these or the new shaded artifacts. *Mapreduce:*My original idea was to split the Phoenix mapreduce code into a third module, so that connectors can only depend on that one and not on the phoenix-coprocessors module. However, the snapshot handling depends on hbase-server, and IIRC there were some other non-trivial dependencies on the server-side code in them. My current thought is that a separate mapreduce module is not needed. The phoenix mapred jobs can be run with just the hbase command, which adds hbase-server to the classpath anyway. The connectors deal directly with HFiles, and so they are not expected to run outside the cluster, and we can just add the extra dependencies to the classpath i.e.: *hbase-client (coming from Spark/Hive), phoenix-client-byo-hbase, phoenix-coprocessors,** hbase-server *and the minimal connector JAR *.* I would especially welcome insight on this one. *5.1:*My original plan was to leave 5.1 alone. However, as 5.1 is shaping up to be a longer term supported version (at least in CLDR), I increasingly find myself warming up to the idea of backporting these changes. First, it would make later backports much less of a pain. Second, it doesn't break any public (or semi-public) API, or change behaviour, we're just moving internal code around. Third, we could make the new feature available faster, in 5.1.4 or 5.1.5. Looking forward to your feedback, either on the above topics, or general. Josh, Jacob, Daniel and Lars have contributed to the previous discussion, and I hope to receive input both from them and from everyone else who hasn't participated in this yet. Best regards Istvan