I agree, we don't want to omit those from the binary distro.
We should identify what those tools are. (Should be easy based on the
presence of main() or the Tool interface)
Such tools could either be moved into a new module, like hbase-tools, or
simply moved to the runtime JARs.

Istvan

On Tue, Mar 5, 2024 at 10:34 AM 张铎(Duo Zhang) <palomino...@gmail.com> wrote:

> There are some tools in the tests jar, such as PerformanceEvaluation.
>
> But anyway, maybe they should be moved to main...
>
> Istvan Toth <st...@apache.org> 于2024年3月5日周二 16:14写道:
> >
> > DISCLAIMER: I don't have a patch ready, or even an elegant way mapped out
> > to achieve this, this is about discussing whether we even want to make
> > these changes.
> > These are also substantial changes, but they could be targeted for HBase
> > 3.0.
> >
> > One issue I have noticed is that we ship test jars and test dependencies
> in
> > the assembly.
> > I can't see anyone using those, but it bloats the assembly and classpath,
> > and adds unnecessary JARs with possible CVE issues. (for example Kerby
> > which is a Hadoop minicluster dependency)
> >
> > My proposal is to exclude the test jars and the test scope dependencies
> > from the assembly.
> >
> > The advantages would be:
> > * Smaller distro size
> > * Faster startup (this is marginal)
> > * Less CVE-prone JARs in the binary assemblies
> >
> > The other issue is that the assembly includes much of the Hadoop
> > distribution.
> > The basic assumption in all scripts and instructions is that the node
> has a
> > fully configured Hadoop installation, and we include it in the classpath
> of
> > HBase.
> >
> > If that is true, then there is no reason to include Hadoop in the
> assembly,
> > HBase and its direct dependencies should be enough.
> >
> > One could argue that it would simplify the client side, which is true to
> > some extent (though 95% of the client distro use cases are served better
> by
> > simply using hbase-shaded-client).
> >
> > We could either remove the Hadoop libraries from either or both of the
> > assemblies unconditionally, or provide two variants for either or both
> > assemblies, one with Hadoop included, and one without it.
> > Spark already does this, it has binary distributions both with and
> without
> > Hadoop.
> >
> > The advantages would be:
> > * Smaller distro size
> > * Faster startup (this is marginal)
> > * Less chance of conflicts with the Hadoop jars
> > * Less CVE-prone JARs in the binary assemblies
> >
> >
> > Thirdly, we could consider excluding the
> > full-fat org.apache.hbase:hbase-shaded-client JAR from the Hadoop-less
> > binary assemblies. It is not used by the assembly, and AFAIK it is not
> > included in any of the 'hbase classpath' command variants.
> >
> > This would make sure that no Hadoop libraries are included (even in
> shaded
> > form) and would make the HBase distribution fully insulated from Hadoop's
> > CVE issues.
> >
> > (The full-fat hbase-shaded-client works best as direct build-time
> > dependency anyway)
> >
> > best regards
> > Istvan
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
------------------------------

Reply via email to