There are some tools in the tests jar, such as PerformanceEvaluation.

But anyway, maybe they should be moved to main...

Istvan Toth <st...@apache.org> 于2024年3月5日周二 16:14写道:
>
> DISCLAIMER: I don't have a patch ready, or even an elegant way mapped out
> to achieve this, this is about discussing whether we even want to make
> these changes.
> These are also substantial changes, but they could be targeted for HBase
> 3.0.
>
> One issue I have noticed is that we ship test jars and test dependencies in
> the assembly.
> I can't see anyone using those, but it bloats the assembly and classpath,
> and adds unnecessary JARs with possible CVE issues. (for example Kerby
> which is a Hadoop minicluster dependency)
>
> My proposal is to exclude the test jars and the test scope dependencies
> from the assembly.
>
> The advantages would be:
> * Smaller distro size
> * Faster startup (this is marginal)
> * Less CVE-prone JARs in the binary assemblies
>
> The other issue is that the assembly includes much of the Hadoop
> distribution.
> The basic assumption in all scripts and instructions is that the node has a
> fully configured Hadoop installation, and we include it in the classpath of
> HBase.
>
> If that is true, then there is no reason to include Hadoop in the assembly,
> HBase and its direct dependencies should be enough.
>
> One could argue that it would simplify the client side, which is true to
> some extent (though 95% of the client distro use cases are served better by
> simply using hbase-shaded-client).
>
> We could either remove the Hadoop libraries from either or both of the
> assemblies unconditionally, or provide two variants for either or both
> assemblies, one with Hadoop included, and one without it.
> Spark already does this, it has binary distributions both with and without
> Hadoop.
>
> The advantages would be:
> * Smaller distro size
> * Faster startup (this is marginal)
> * Less chance of conflicts with the Hadoop jars
> * Less CVE-prone JARs in the binary assemblies
>
>
> Thirdly, we could consider excluding the
> full-fat org.apache.hbase:hbase-shaded-client JAR from the Hadoop-less
> binary assemblies. It is not used by the assembly, and AFAIK it is not
> included in any of the 'hbase classpath' command variants.
>
> This would make sure that no Hadoop libraries are included (even in shaded
> form) and would make the HBase distribution fully insulated from Hadoop's
> CVE issues.
>
> (The full-fat hbase-shaded-client works best as direct build-time
> dependency anyway)
>
> best regards
> Istvan

Reply via email to